Development languages, today gives the power to the developers to serve the Industry with best OOP concepts. The power of an Object has made the world go round. We went to moon and several other planet’s and back. Still there’s something that bothers me today.
The scheme of Persistance of an Object with a Relational Database is still poor.
Ofcourse, the various ORM tools, in the market, make our data structures and applications more Object oriented, thereby making the implementations simpler.
Available Tools – Hibernate, TopLink, JPA, several others.
None amoung them, is ideal. This is refered to as the ORM Impedance Mismatch. While abstracting away the database is a higly intellectual (and ideal) goal, the fact that a relational database is underneath the covers will always leak.
Joel (from joelonsoftware) calls this – The Law of Leaky Abstractions.
The simplest form of the disconnect is represented by mapping hierarchal objects to database tables. It can definitely be done. But the result leaves little doubt about the implementation — The amount of effort expended designing the ideal mapping could probably be better expended on a solution to the real problem instead of, the problem created by choosing a solution before examining the problem.
More evidence comes from a recent post on DZone. The author complains about a developer writing code that makes horribly inefficient use of a database. While true, this is only revealed if you know the underlying implementation. From a purely OO view, the code is just fine.
I believe the fundamental problem with the database solution comes from the fact that it is often slapped on an application by default. “We need persistence.” “Well, let’s use a database [and the ORM]“.
While a RDBMS is a fine (and mature) solution, it is not always optimal. Choosing a solution before giving the domain problem some serious analysis is always a mistake.
The core issue is, we want to be able to preserve and restore state of the certain data structures in an application. Bonus points for transparently sharing the states amongst various machines (for scalability).
Attempts to Remove the RDBMS
All of the attempts floating around of trying to build a so called object oriented database provide evidence that I’m not alone in my goal to replace the RDBMS. We get some cool toys, like Apache’s CouchDB that change the way we look at the database. Specs, like the JCR (Content Repository for Java) provide alternative methods of storing data that look more like the objects that we truly want to deal with.
All of these methods have one huge drawback, at some point you are mapping some other data format to your objects, whether with property/xml files, metadata (annotations), or just code. Various systems make it easier but, it’s always there. It just feels wrong.
Many are just wrappers around a RDBMS. This gets exposed in the way that some queries are exceptionally slow while others are blazing fast. You won’t which is which until you understand how the database is being used. This causes code to be modified to use it in the fastest possible way. Abstraction is broken.
Several years ago, I even made an attempt to replace read-only databases with a Lucene search index. It actually worked exceptionally well. Using the Lucene search index to query for data is an order magnitude faster than calling a RDMS. In that particular case, it was greater than 2 orders of magnitude faster but, there were other issues… The concept never really took off. It’s hard to break the psychological connection with the database solution, no matter how great the discomfort.
The Ideal Solution
Wouldn’t the ideal solution be where your application just maintains its state?
- between restarts
- among machines in a cluster
- Just before crashes (last good state)
In such a world you don’t have acknowledge that a persistence mechanism exists at all. You just write your application code; set fields in objects; treat processes in the various machines in the cluster as if they were threads of execution on a single machine. Even if something somehwere goes wrong, last good state is maintained to prevent nightmares.
A Dream – Long left Un-fullfilled
We are coming to the close of 2010. You’d think by now we’d have a way to share system state among a group of computers. A way to keep that state backed up on a file system to allow seem-less restoration if a reboot is required or a box just crashes.
You should be able to write your application as if it only lived on a single machine that never crashed.
Serialization- A Solution?
What about using serialization to simply persist the state of the application? Or image based persistence, like Smalltalk?
In the days of C/C++, we could get the address of our objects in memory and just write the bytes to disk. It was an easy way to save and restore system state. Java provides a whole serialization API (addresses aren’t available for security reasons).
A thread could be created that constantly kept the serialized data file up-to-date with the objects in the application. However, such a solution probably won’t scale well across a cluster. Transparency would be lost. Interfaces polluted (things need to implement serializable).
Though simple, serialization probably would not be the best solution but, it would be an interesting experiment.
The most obvious way to do this would be to set up a background process that kept memory synchronized across a cluster of machines and a file. This would keep the various machines in sync with one another, the files would allow state restoration if a machine crashed (if it couldn’t just pull state from a neighbor).
(starting to look like the serialization solution again)
It would also seem that a virtual machine could offer the greatest chance of success to implement a solution. With a virtual machine, it’s much easier to make some magic happen behind memory access than with something that has direct access to the memory space.
So, what alternatives do we already see out in the market?
Oracle makes a good attempt with their Coherence product.
The problem with this solution lies in its implementation. Sending whole objects across the network can quickly saturate the network (as evidenced by various http session sharing schemes). Coherence also requires interfaces to become polluted a bit by requiring that object implement Serializable (but, this is pretty minor).
For its problems, the Oracle solution could be useful in some cases and may improve as it matures. The risk is, the solution cuts into Oracle’s database clustering business. The motivation to improve the project may not be very high.
Terracotta appears to offer everything in the short list of requirements:
- syncs across the network
- keeps state sync’d with the disk
- perfect match
Terracotta gives me everything that I asked for and manages it with an optimized transparent solution. Instead of forcing objects to implement serializable or requiring other types of implementation changes, it works transparently under the covers of the virtual machine. It manages to optimize network usage by only sending the diffs of objects instead of whole objects. It even keeps the state synch’d with the file system. Basically, the most transparent persistence system around, right now.
The Delta Magic
Terracotta is smart. Terracotta doesn’t replicate across machines needlessly. It does just enough to provide fail-over protection and the rest is ‘on demand’. It can even push unused data off a machine.
Added up, for each machine added to a cluster, the effective the memory for each machine is increased.
When I see something like this I wonder, what I would even need a database for.
The only reason I can see is, to make data available for data-mining and (so called) BI, Business Intelligence, packages (or warehousing, cubes (as per MS)). Most of these tools are already designed around a database.
So, the RDBMS effectively becomes a logging mechanism.
Kill the RDBMS
So, by using common collections (sets/lists/maps) transparently backed by Terracotta the RDBMS can be effectively digged-out and ripped of an application. The result is cleaner, more maintainable code, more efficient use of memory, and faster execution times. No longer you have to worry about the Impedance-mismatch.
What’s bad in this Idea?
Will this Concept Flourish?
Will the concept of using shared memory take off as a way to get rid of the database and not simply be a means to massively scale?
I hope so. This is the day and age where everyone is embracing simplicity. The proliferation of Ruby on Rails, Grails, Spring, Wicket, and other frameworks show that most developers have had it with over-complex solutions.
Maybe they’ll be willing to get rid of one solution altogether.
Am I totally wrong?
One role where the RDBMS could be difficult to remove comes from when it serves a greater purpose, like an integration point for multiple applications.
One solution is placing an HTTP wrapper around the database. This converts it from an integration point to an application.
Another area where this probably won’t work is data warehousing. But, that would be an excellent application for wrapping it up in a REST API layer.
As a common way for various applications, regardless of implementation, to share data, the use of a RDBMS seems hard to beat. My thoughts for a Terracotta solution could work across apps built on languages that can run on the JVM. But, reaching out to others (C/C++/Smalltalk) might be a bit difficult, as far as company’s plans are cocerned.
I suppose we can’t kill all of the databases out there.
But here is what we can do
We, the designers and builders of applications should take responsibility,
we should analyze the problems we are trying to solve,
we should try to select the best solution for the specific problem at hand,
we should choose the best tool to fit and sync the business value.