Thursday, December 17, 2009

The cloudy future of relational databases

Cloud computing is the big new thing. The fate of the formerly ubiquitous relational database, on the other hand, is uncertain. There are many ways for a developer to deploy his application to the "cloud", whatever that might mean. On the other hand, the various cloud vendors do not necessarily have a database story to tell, or if they do that story isn't necessarily a good one. As a result, deploying a traditional database-backed application to the cloud may be difficult.

One story I have heard is to simply opt out of using a relational database. You would use some "NoSQL" persistence technology, such as SimpleDB on Amazon or Google's datastore. Such solutions have their advantages in terms of performance, scalability and ease of mapping to objects. If you just want to store dumb data objects, that may be all you need. But there are situations where the traditional RDBMS is more appropriate. If you have a data-centric view of your problem domain, where the data is ontologically independent of the application and can outlive the application, where you want the database to guarantee certain transactional and integrity rules, where you want to do lots of ad-hoc querying ... you might just want an RDBMS. My worry is that for many cloud platforms, you may not have that option.

For the platform as a service (PaaS) sort of cloud, where you may not even have filesystem access, setting up your own database server will not be an option. Even where the vendor provides a database service, your choice remains limited to that one service. For example, if you use Azure you can only use SQL Server. The infrastructure as a service (IaaS) sort of cloud will generally give you that freedom, but it may have limitations. Generally, IaaS vendors like Amazon offer:
  • a compute service: a virtual machine with local filesystem storage. Small, fast storage.
  • a storage service: an API to read/write data on the cloud. Big, slow storage. 
So you have a choice of small, fast storage (EC2) or big, slow storage (S3). Trouble is, a database server really wants big and fast storage.

At this point, I think the reader of this blog may want to slap me on the head and point out that yes, Amazon tells a perfectly good database story. In fact, it must have at least half a dozen stories to tell. You can use their MySQL service, put up an Oracle AMI or stand up your own database server: there are lots of choices both from Amazon and third parties. There are choices partly because Amazon's EC2 can use elastic block storage (EBS). This is a filesystem which is both big and fast, especially if you set up several EBS volumes in a RAID configuration. I noticed that my former colleague Chip recently blogged about his success deploying database-backed applications to Amazon's cloud.

OK, so you have lots of database choices if you go with the Amazon cloud. But only Amazon. I have been talking to another cloud IaaS vendor without a database story, being stuck with the fast/small vs slow/big dichotomy. With other vendors, it seems, you either have just one choice of RDBMS or you simply cannot get a production-ready RDBMS in the cloud. Did I miss an another option or two? You tell me.

3 comments:

  1. Rackspace have an RDBMS option now; MySQL based.

    ReplyDelete
  2. "The fate of the formerly ubiquitous relational database, on the other hand, is uncertain."

    Funniest thing I have read all day.

    ReplyDelete
  3. I'm glad that someone got a kick out of this - I wouldn't be so quick to dismiss schema-less storage architectures and the associated cloud products. Bret Taylor (co-founder of FriendFeed), posted earlier this year that he needed to find a solution to some tough RDBMS issues and landed on a solution that could easily be implemented in almost any of the storage containers that Chris mentions. The point being, at scale, there are plenty of platforms that are better served by an OO store fronted by a cache rather than doing all the SQL work to persistently exchange data with a RDMS - and plenty of good cloud solutions that fit that bill.

    Note: Yes, I understand that Bret is actually using MySQL as his storage container, but that is mostly because he was already using it.

    ReplyDelete