One story I have heard is to simply opt out of using a relational database. You would use some "NoSQL" persistence technology, such as SimpleDB on Amazon or Google's datastore. Such solutions have their advantages in terms of performance, scalability and ease of mapping to objects. If you just want to store dumb data objects, that may be all you need. But there are situations where the traditional RDBMS is more appropriate. If you have a data-centric view of your problem domain, where the data is ontologically independent of the application and can outlive the application, where you want the database to guarantee certain transactional and integrity rules, where you want to do lots of ad-hoc querying ... you might just want an RDBMS. My worry is that for many cloud platforms, you may not have that option.
For the platform as a service (PaaS) sort of cloud, where you may not even have filesystem access, setting up your own database server will not be an option. Even where the vendor provides a database service, your choice remains limited to that one service. For example, if you use Azure you can only use SQL Server. The infrastructure as a service (IaaS) sort of cloud will generally give you that freedom, but it may have limitations. Generally, IaaS vendors like Amazon offer:
- a compute service: a virtual machine with local filesystem storage. Small, fast storage.
- a storage service: an API to read/write data on the cloud. Big, slow storage.
At this point, I think the reader of this blog may want to slap me on the head and point out that yes, Amazon tells a perfectly good database story. In fact, it must have at least half a dozen stories to tell. You can use their MySQL service, put up an Oracle AMI or stand up your own database server: there are lots of choices both from Amazon and third parties. There are choices partly because Amazon's EC2 can use elastic block storage (EBS). This is a filesystem which is both big and fast, especially if you set up several EBS volumes in a RAID configuration. I noticed that my former colleague Chip recently blogged about his success deploying database-backed applications to Amazon's cloud.
OK, so you have lots of database choices if you go with the Amazon cloud. But only Amazon. I have been talking to another cloud IaaS vendor without a database story, being stuck with the fast/small vs slow/big dichotomy. With other vendors, it seems, you either have just one choice of RDBMS or you simply cannot get a production-ready RDBMS in the cloud. Did I miss an another option or two? You tell me.