Development languages, today gives the power to the developers to serve the Industry with best OOP concepts. The power of an Object has made the world go round. We went to moon and several other planet’s and back. Still there’s something that bothers me today.
The scheme of Persistance of an Object with a Relational Database is still poor.
Ofcourse, the various ORM tools, in the market, make our data structures and applications more Object oriented, thereby making the implementations simpler.
Available Tools – Hibernate, TopLink, JPA, several others.
None amoung them, is ideal. This is refered to as the ORM Impedance Mismatch. While abstracting away the database is a higly intellectual (and ideal) goal, the fact that a relational database is underneath the covers will always leak.
Joel (from joelonsoftware) calls this – The Law of Leaky Abstractions.
The simplest form of the disconnect is represented by mapping hierarchal objects to database tables. It can definitely be done. But the result leaves little doubt about the implementation — The amount of effort expended designing the ideal mapping could probably be better expended on a solution to the real problem instead of, the problem created by choosing a solution before examining the problem.
More evidence comes from a recent post on DZone. The author complains about a developer writing code that makes horribly inefficient use of a database. While true, this is only revealed if you know the underlying implementation. From a purely OO view, the code is just fine.
DB Fundamentals
I believe the fundamental problem with the database solution comes from the fact that it is often slapped on an application by default. “We need persistence.” “Well, let’s use a database [and the ORM]”.
While a RDBMS is a fine (and mature) solution, it is not always optimal. Choosing a solution before giving the domain problem some serious analysis is always a mistake.
The core issue is, we want to be able to preserve and restore state of the certain data structures in an application. Bonus points for transparently sharing the states amongst various machines (for scalability).
Attempts to Remove the RDBMS
All of the attempts floating around of trying to build a so called object oriented database provide evidence that I’m not alone in my goal to replace the RDBMS. We get some cool toys, like Apache’s CouchDB that change the way we look at the database. Specs, like the JCR (Content Repository for Java) provide alternative methods of storing data that look more like the objects that we truly want to deal with.
All of these methods have one huge drawback, at some point you are mapping some other data format to your objects, whether with property/xml files, metadata (annotations), or just code. Various systems make it easier but, it’s always there. It just feels wrong.
Many are just wrappers around a RDBMS. This gets exposed in the way that some queries are exceptionally slow while others are blazing fast. You won’t which is which until you understand how the database is being used. This causes code to be modified to use it in the fastest possible way. Abstraction is broken.
Several years ago, I even made an attempt to replace read-only databases with a Lucene search index. It actually worked exceptionally well. Using the Lucene search index to query for data is an order magnitude faster than calling a RDMS. In that particular case, it was greater than 2 orders of magnitude faster but, there were other issues… The concept never really took off. It’s hard to break the psychological connection with the database solution, no matter how great the discomfort.
The Ideal Solution
Wouldn’t the ideal solution be where your application just maintains its state?
- between restarts
- among machines in a cluster
- Just before crashes (last good state)
In such a world you don’t have acknowledge that a persistence mechanism exists at all. You just write your application code; set fields in objects; treat processes in the various machines in the cluster as if they were threads of execution on a single machine. Even if something somehwere goes wrong, last good state is maintained to prevent nightmares.
A Dream – Long left Un-fullfilled
We are coming to the close of 2010. You’d think by now we’d have a way to share system state among a group of computers. A way to keep that state backed up on a file system to allow seem-less restoration if a reboot is required or a box just crashes.
You should be able to write your application as if it only lived on a single machine that never crashed.
Serialization- A Solution?
What about using serialization to simply persist the state of the application? Or image based persistence, like Smalltalk?
In the days of C/C++, we could get the address of our objects in memory and just write the bytes to disk. It was an easy way to save and restore system state. Java provides a whole serialization API (addresses aren’t available for security reasons).
A thread could be created that constantly kept the serialized data file up-to-date with the objects in the application. However, such a solution probably won’t scale well across a cluster. Transparency would be lost. Interfaces polluted (things need to implement serializable).
Though simple, serialization probably would not be the best solution but, it would be an interesting experiment.
Shared Memory
The most obvious way to do this would be to set up a background process that kept memory synchronized across a cluster of machines and a file. This would keep the various machines in sync with one another, the files would allow state restoration if a machine crashed (if it couldn’t just pull state from a neighbor).
(starting to look like the serialization solution again)
It would also seem that a virtual machine could offer the greatest chance of success to implement a solution. With a virtual machine, it’s much easier to make some magic happen behind memory access than with something that has direct access to the memory space.
Solutions
So, what alternatives do we already see out in the market?
Oracle Coherence
Oracle makes a good attempt with their Coherence product.
The problem with this solution lies in its implementation. Sending whole objects across the network can quickly saturate the network (as evidenced by various http session sharing schemes). Coherence also requires interfaces to become polluted a bit by requiring that object implement Serializable (but, this is pretty minor).
For its problems, the Oracle solution could be useful in some cases and may improve as it matures. The risk is, the solution cuts into Oracle’s database clustering business. The motivation to improve the project may not be very high.
Terracotta
Terracotta appears to offer everything in the short list of requirements:
- syncs across the network
- keeps state sync’d with the disk
- transparent
- fast
- easy
- perfect match
Terracotta gives me everything that I asked for and manages it with an optimized transparent solution. Instead of forcing objects to implement serializable or requiring other types of implementation changes, it works transparently under the covers of the virtual machine. It manages to optimize network usage by only sending the diffs of objects instead of whole objects. It even keeps the state synch’d with the file system. Basically, the most transparent persistence system around, right now.
The only caveat, this one is for Java only. Sorry .net guys. So, to exploit it you’re gonna be stuck with Java, Haskell, Scala, Groovy, jRuby, Jython, JavaScript or one of the other hundred languages that run on the JVM (isn’t there some C# compiler for the JVM out there somewhere?).
The Delta Magic
Terracotta is smart. Terracotta doesn’t replicate across machines needlessly. It does just enough to provide fail-over protection and the rest is ‘on demand’. It can even push unused data off a machine.
Added up, for each machine added to a cluster, the effective the memory for each machine is increased.
When I see something like this I wonder, what I would even need a database for.
The only reason I can see is, to make data available for data-mining and (so called) BI, Business Intelligence, packages (or warehousing, cubes (as per MS)). Most of these tools are already designed around a database.
So, the RDBMS effectively becomes a logging mechanism.
Kill the RDBMS
So, by using common collections (sets/lists/maps) transparently backed by Terracotta the RDBMS can be effectively digged-out and ripped of an application. The result is cleaner, more maintainable code, more efficient use of memory, and faster execution times. No longer you have to worry about the Impedance-mismatch.
What’s bad in this Idea?
Will this Concept Flourish?
Will the concept of using shared memory take off as a way to get rid of the database and not simply be a means to massively scale?
I hope so. This is the day and age where everyone is embracing simplicity. The proliferation of Ruby on Rails, Grails, Spring, Wicket, and other frameworks show that most developers have had it with over-complex solutions.
Maybe they’ll be willing to get rid of one solution altogether.
Am I totally wrong?
One role where the RDBMS could be difficult to remove comes from when it serves a greater purpose, like an integration point for multiple applications.
One solution is placing an HTTP wrapper around the database. This converts it from an integration point to an application.
Another area where this probably won’t work is data warehousing. But, that would be an excellent application for wrapping it up in a REST API layer.
As a common way for various applications, regardless of implementation, to share data, the use of a RDBMS seems hard to beat. My thoughts for a Terracotta solution could work across apps built on languages that can run on the JVM. But, reaching out to others (C/C++/Smalltalk) might be a bit difficult, as far as company’s plans are cocerned.
Verdict
I suppose we can’t kill all of the databases out there.
But here is what we can do
We, the designers and builders of applications should take responsibility,
we should analyze the problems we are trying to solve,
we should try to select the best solution for the specific problem at hand,
we should choose the best tool to fit and sync the business value.
loading...
loading...
You missed GemStone/S, which does the right thing, in a way that scales quite nicely. You write Smalltalk code, and load the code into the system. And it just runs. Think of it as Oracle (in all the power and failover and HA that that entails) with Smalltalk as the stored-procedure language. See http://seaside.gemstone.com for how this applies to serving web pages, and http://www.gemstone.com/pdf/OOCL_SuccessStory.pdf for an amazingly large success story.
loading...
loading...
agree, thts a nice one too 🙂
loading...
loading...
For many it’s strictly a legacy issue, having a big fat Oracle instance deployed and no real possibility to cultivate change can put a stop on much deployment of innovation in this field. Indeed, the benefits of a (R)DBMS is that you can do data warehousing and reporting, query left and right till your heart’s desire.
Querying through an artificial layer is my biggest issue with current ORM’s (in Java anyway), where we’re given this HQL/JPQL syntax to be used inside string annotations, it’s clumsy, not type-safe and just plain ugly. This part the C# guys got right with LINQ (I agree, we need a C# compiler for the JVM).
I’m fairly certain than in a few years, the current approaches (especially in the Java space) with a gazillion frameworks and layers will be frowned upon. I don’t believe in simplicity through added visible layers, this just trades in other problems which are often harder to fix.
PS: Alex Miller addressed the “Terracotta as a DB” issue a little while back: http://tech.puredanger.com/2008/12/07/terracotta-replacing-the-database/
loading...
loading...
You missed Dreamsource ORM. It solves the mismatch. Development will be easy and fast with Dreamsource ORM.
Jim
http://www.gwtorm.com for GWT ORM
http://code.google.com/p/dreamsource-orm/
loading...
loading...
I’ve recently came across Neo4J and CouchDB, have you (or anyone else) given that a try? They’re probably not the solution to every problem, but they seem like nice alternatives to RDBMS 🙂
loading...
loading...
Overall I agree and I long for getting rid of the RDBMS. But your article only deals with the ORM problem: what about transactions?
loading...
loading...
>> The only reason I can see is, to make data available for data-mining and (so called) business intelligence packages (or warehousing, of course). Most of these tools are already designed around a database.
Terracotta looks very cool, but regarding “the only reason”….
What about opperational (realtime) reporting? External integration? Queries of large amounts of objects, or queries where sql is used (joins, aggreages, unions,etc), where indexing is needed for reasonable performance? What about adhoc querying and schema browsing? What about schema evolution?
Even you case of a BI requirement, how to you get the data into the warehouse?
>> So, the RDBMS effectively becomes a logging mechanism.
How do you “log” to the RDBMS? With an ORM?
loading...
loading...
Two interesting experiments (Java only) that you didn’t mention are Prevayler and PJama. ORM has gone directly from being widely available to being overused. I don’t think you’re completely wrong, but relational databases excel as a combination of integration and persistence. A well designed schema supports many different views of the data well at the same time.
Prevayler
http://www.prevayler.org/wiki/
PJama
http://research.sun.com/forest/opj.main.html
loading...
loading...
Well Hibernate offers criteria which is abit more intuitive than HQL. I think our problem is not just with transactions but with more basic things like backup/restore, setting up QA servers etc :/
loading...
loading...
@Fabrizio Giudici
The challenges w.r.t. to transactions would be a better impedance match with JTA. Integrating this with Terracotta could be one of the remedies.
I`ll do a test lab on this and will get back to u.
loading...
loading...
@mmthm
>>Terracotta looks very cool, but regarding “the only reason”….
What about opperational (realtime) reporting? External integration? Queries of large amounts of objects, or queries where sql is used (joins, aggreages, unions,etc), where indexing is needed for reasonable performance? What about adhoc querying and schema browsing? What about schema evolution?
Even you case of a BI requirement, how to you get the data into the warehouse?
See, in today’s world primary purpose of databases has been persistance, if we can tackle this using technologies like terracotta, the only other need for database will be for logging. Logging in the sense for maintaining HISTORY of transaction, HISTORY for reporting, etc.
It would be nothing more than a reservoir which will give access to all that was there in the past. use data warehousing or cubes or watever for analytics, etc…
Now the answer to ur 2nd question:
RDMBS still needs to exist.. but primary database wud be Terracotta. RDMS logging can be a secondary thread through some sort of new API like log4j for logging.. similar for DB that runs in background 😛
loading...
loading...
One of the primary objectives of the RDBMS is providing performant access to data while consuming as little space on disk as possible. Having all objects in “memory” whether on disk or in RAM would be a HUGE space overhead.
loading...
loading...
@Taranfx
>> See, in today’s world primary purpose of databases has been persistence
If persistence is all you care about, then a dbms is not necessary. But almost all business applications in today’s world require all the items i mentioned. You can not do complex queries or queries over large objects using a log. You can not report on a log. You can not browse the object model/data/schema of a log, etc….
I guess if you built a log processor that somehow replayed all the transactions into an RBMS, then some of this problems could be solved, but aren’t you just deferring the problems you were trying to avoid (the use of RDBMS and ORMs)? And building a transactional logging system that *never* failed to lose a single message would be very difficult. Not something you would want to rebuild for every application. And if you did lose a message how would get your DBMS consistent with the app data? Or if there were DBMS transactions errors, what would you do? Queries on large collections is still difficult because the DBMS and the app are no longer consistent.
>> RDMBS still needs to exist. but primary database wud be Terracotta. RDMS logging can be a secondary thread through some sort of new API like log4j for logging.. similar for DB that runs in
What is you goal if you now say that you do need an RDBMS? I thought that the post was trying to suggest an alternative to an RDBMS?
It seems you still have the RDBMS problem (and likely the ORM issues, since you need to somehow get your data into the RDBMS), except now you have a lot of extra design/implementation complexities.
If the log operated in write-behind manner, and you managed to solve all the reliability and consistency issues, then performance could be a very nice tradeoff for all the added complexity (maybe), but that is a different topic than using Terracotta as a replacement for a RDBMS.
I think Terracotta can be used wonderfully for many use cases that can help simplify and speed applications (cluster caching, grid processing, etc…), but I have not seen a reasonable architecture for making Terracotta the system of record for common business application.
loading...
loading...
@mmthm
>>You can not report on a log.
I didnt mean actual “log”, All I meant was database can be compared to “a history-aware data store” for warehousing analysis and reports (just lik it is done today).
>>What is you goal if you now say that you do need an RDBMS? I thought that the post was trying to suggest an alternative to an RDBMS?
Purpose of the post was to indicate the primary purpose of RDBMS can be replaced with terracotta, but RDBMS will still exist for secondary reasons.
🙂
Its an endless discussion.
Perception is like A** Hole, Every one has one ..lol
loading...
loading...
we don’t have completely remove RDBMS, i think would be nice to have best of both worlds together, RDBMS to manage refs between objects and something else to store other object data, like here :
http://jwebd.blogspot.com/2009/04/may-be-we-need-both-rdbms-and-key-value.html
loading...
loading...
@Fabrizio Giudici
Have a look at JBoss Cache – http://www.jbosscache.org – a replicated, in-memory object cache which does support JTA. Or better still, the Infinispan project – http://www.infinispan.org – a distributed data grid platform.
But yes, overall good article, could not agree more that databases are overused and abused and are probably the single biggest bottleneck in almost any system. Bring on “persistence to clustered memory”!
loading...
loading...
I’ve successfully removed my dbms completely from our transaction processing system using Terracotta. It works wonderfully.
loading...
loading...
cheers @David
Indeed a braveful act and a memorable worth sharing an experience..
wud u like to share ur experience? we can blog it down..
mail me: taranfx [at] gmail [dot] com
loading...
loading...
Taranfx –
For real applications, Coherence makes Terracotta look like it’s standing still. Data grid applications don’t just move data, they spread work out, and Terracotta suffers from using a central server with the workers all remote.
Not sure what “http session sharing schemes” you are referring to, but many of the big online retailers are using Coherence for their HTTP sessions, and Terracotta can’t touch the scale or the performance that Coherence provides.
Peace,
Cameron Purdy | Oracle
http://coherence.oracle.com
loading...
loading...
@cameron,
For scale Terracotta has partially solved this problem using FX with multiple active-active servers. This provide very good high-availability with cluster-wide coherent data structures and scale. and Your data is stored only in one place.
I doubt coherence stores data in central place. You dont know where your data is? How do you manage backup?. When any server goes down how and how much time it takes to recover completely from it? These are things to be considered when you think of any solution as replacement of database ( not virtually RDBMS has its own place and use and I belive Terracotta + RDBMS is the way to achieve scale-out). Off couse FX is first release and way to go. Coherence is very mature product and has its strengths.
loading...
loading...
heard of memcached, anyone?
loading...
loading...
Take a look at http://www.ikvm.net/ maybe someone adventures to create a .Net binary for Terracota with it.
loading...
loading...
Impedance mismatch is not hard, it is easy. Each concrete class maps to one table, with all the fields of that class.
The hard problems are distribution and concurrency.
The solution to the distribution is to get the database, or at least some useful part of it, in process. This discussion is about creating and object database.
It sounds like you want an object database. What about Zope DB, db4o, objectivity, etc.?
loading...
loading...
what went wrong with lucene? “The concept never really took off. ” is all you really say….
loading...
loading...
Lucene’s tragedy goes here. http://johannburkard.de/blog/programming/java/apache-lucene-the-ghetto-search-engine.html
loading...
loading...
What about SQL aggregate functions?
loading...
loading...