|Amazon SimpleDB Developer Guide|
|author||Prabhakar Chaganti, Rich Helms|
|summary||"Getting started" guide to using Amazon's SimpleDB cloud database|
Those who prefer traditional relational databases could try eg Amazon RDS. This book only covers SimpleDB, a NoSQL or non-relational database. As is well known, NoSQL databases grew in popularity with the growth of large distributed systems and cloud computing, and their proponents tout their scalability and speed.
For anyone wanting a quick primer on NoSQL databases, this book includes a chapter on NoSQL which isn't limited to SimpleDB. It outlines some key conceptual differences between NoSQL and relational database management systems, with pros and cons, using the analogy of "a spreadsheet with some XML characteristics", and illustrating with some concrete examples. That chapter's been made available as a free sample chapter (SimpleDB versus RDBMS), so you can get a flavour of the book.
The contents list for this book is online, I won't recite it here. As well as an overview of SimpleDB, its terminology and advantages, the book goes through signing up with AWS and SimpleDB, and the account access keys. That chapter is also online, as a tutorial.
You may ask, how does this book differ from Amazon's free SimpleDB documentation, which includes a developer guide and a "getting started" guide? Amazon's own "getting started" is certainly helpful, and it's worth downloading and trying their web app scratchpad. But Amazon's detailed developer guide concentrates on REST and SOAP requests, which most people wouldn't want to deal with direct at that low level.
This book's focus is on using the SimpleDB web services API through certain specific languages and libraries — namely Java (JDK6 — using the typical 1.6 library plus several dependencies), Python (2.5 — you need boto), and PHP (with curl). It recommends the SDBtool Firefox extension (SDBizo), which is excellent for checking the results of running the code.
I've tried the book's Java and Python examples, on Windows. Not PHP, as I've not got round to learning PHP yet, though I skimmed the PHP explanations. Similarly, I've not had time to try it all over again on Linux. Generally, the book's coverage seems fuller and better for PHP than for Java or Python. Perhaps it was originally written for PHP, and the rest was bolted on — the stuff for Java more hurriedly than for Python?
The downloadable code samples, as mentioned, are PHP only. They really should have provided downloadable code for all 3 languages, plus some fake MP3 files (see later). If you get the e-book (available in PDF and epub), you can copy and paste the Java or Python code. But that's a tad tedious, especially when the code runs onto a new page, and there are stray end of lines etc that you have to delete manually. Furthermore, the Python code provided is for the interpreter in interactive mode (not for .py files, except a couple towards the end). So, for the Python, you also have to copy/paste each line one at a time. But that still beats having to re-type pages of code in full.
In other words, if you want this book and you're only interested in PHP, you can get away with just buying the hard copy and downloading the code from the Packt site. But if you prefer Python or Java, to save your fingers and blood pressure you should buy just the e-version, or get both paper and e books together. I really hope Packt will in future provide downloadable code samples for all the languages covered.
I have more issues with the sample code given in this book. The typical imports should have been spelled out in the example Java code. Eclipse offers more than one possible import in some cases. It was "try everything till it works", at least until I found this tutorial. I've included the initial required typical imports (though not the standard java.util etc ones) in my own list of points, which I'll say more about at the end of this review. Surely it wouldn't have been difficult to include just those few lines of imports, which could have saved readers a lot of time trying to work out the correct imports. There are also errors in the Python code, and on one page the code that should have been included is missing altogether.
Now, more on the book proper. After the overview described above, this book walks you through the basic SimpleDB operations: how to create a SimpleDB "domain" (equivalent to a worksheet in a spreadsheet), list domains, create/retrieve items (like spreadsheet rows), and delete domains.
Items have attributes (spreadsheet column headings), as key:value pairs — the key is the attribute name, the value is its value, eg address:1 Acacia Avenue. An attribute can have more than one value, eg the same item can have both address:1 Acacia Avenue and address:2 Broadway. The book also lists the SimpleDB constraints on domains, items and attributes — maximum number or size, etc — but it's best to check the AWS site for the latest info.
Code examples are given for each of the 3 languages mentioned. The examples are similar, but don't always cover the same ground. If they'd done that, where possible, it would have been more helpful to those of us trying examples in more than one language. One advantage of a book with associated website is that electronic updates can be published, and it would have been great if that had been done for this book. For instance, the book gave conditional put/delete code examples only for PHP. At the date of this review, boto now supports those features, but sample supplemental Python code for that still hadn't been made available.
SimpleDB stores attribute values as UTF-8 strings. This means that comparisons for sorting or searching are done lexicographically (character by character, left to right, numbers take precedence over uppercase over lowercase), and to handle numbers or dates you have to encode and decode them yourself. So, the book has a chapter explaining lexicographical comparison, data types, and how to encode and decode data to enable proper sorting and comparison of numbers, dates, Boolean values and XML-restricted characters. In the case of numbers this means zero padding and offsets, and there's example code for decoding and encoding numbers. Unlike with PHP and Python, oddly the Java code given was for the body of the typical method that carries out the encoding etc. This could have been omitted, and they should have given example code illustrating the method's usage instead. Similarly for the date formats code.
The SimpleDB query syntax is generally covered well, in a chapter which takes readers through first creating a sample database of song metadata to run queries against. It's not too painful copy/pasting the Java code (3+ pages), but with Python in interactive mode I drew the line at creating every song item and attributes using individual statements, even with pasting, so I just tried adding a couple of random ones to test that the code worked. I say again, full downloadable code please...!
That chapter then gives helpful examples of queries against the sample database and their results, including for more complex combined queries ("and", "or" type queries, "not" etc), and querying for multiple-value attributes. It also provides code examples for sorting and counting query results. But the Java code for retrieving an item's attributes wouldn't run, and I couldn't find the method used (getItemsAttributes()) detailed in the typical documentation; perhaps the book is out of date here?
The book starts going beyond the basics from Chapter 7 onwards, with a chapter on Amazon's S3 storage service — another well known component of Amazon Web Services, where "objects" (files) may be stored in "buckets" (directories), with "keys" used to retrieve objects.
For S3, the book uses JetS3t for Java. However, the Java code given for uploading files to S3 didn't demonstrate any integration with SimpleDB at all — the files were just uploaded with their filenames as the S3 keys, and the code didn't seem to deal with the creation of your own custom S3 keys for uploaded objects. In contrast, the Python code generated the S3 keys for the files from hashes previously produced and stored in the SimpleDB database, as well as dealing with their uploading. In addition, for me the Java code for downloading files from S3 just wouldn't run, and also it wasn't clear where the files were supposed to be downloaded to locally, unlike with the Python example. Inexplicably, there was no info on how to delete objects from S3 buckets, or indeed how to delete buckets. So, while the S3 chapter is of help, it could definitely do with being expanded, especially the Java sections.
Next, money money money. AWS charges are based on usage, so the chapter on tuning and usage costs has some practical value in explaining how SimpleDB is charged for, the "BoxUsage" value returned by requests to SimpleDB, using BoxUsage to optimize queries and compute costs, and how to get BoxUsage values back with your queries using Java, Python etc. There are code examples that, when run, illustrate the different BoxUsage values you get when you use different operators or expressions in queries (eg, using LIKE costs more).
However, partitioning your data into multiple domains is covered in only a few paragraphs, with no code given. I'd have liked to see more info on that, and some sample code for the partitioning process.
To further save money, you can use a cache to store data locally, trying your local cache first; and, only if the data is not there, would your app go out to SimpleDB and incur costs for querying it. This book accordingly has a chapter on how to install and use the popular open source caching system memcached to cache your query results locally. (CacheLite for PHP is also covered.) Again, the Java sections caused me some frustration. The Java test code showed that the memcached server was running properly on my machine, but the Java code for using the cache just didn't work; it ran, but continued to query SimpleDB direct. The Python code, however, worked perfectly — except that, if you're using memcached in Windows, you'll need to use port 11211 instead of what's shown in the book. (I didn't try it in Linux.)
Finally, the book deals with running parallel operations against SimpleDB, using its BatchPutAttributes. The section on updating SimpleDB in Python by making serial consecutive calls to SimpleDB is completely missing the code for the script, but the book does then cover inserting multiple items concurrently into SimpleDB using a threadpool in Java. It also gives sample Python code for alternative ways of parallelising requests: using Python's built-in threading module, threading and queues combined, then threading using the open source workerpool module.
To conclude, in substance the book has a fair amount of useful information on the basics of getting started with SimpleDB, particularly for Python (and probably PHP). But not providing downloadable code samples in Java and Python, or "fake" MP3 files to try S3 uploading/downloading, is a minus.
Some errors, inconsistencies and missing information from the department of "I-wish-they'd-included-this-even-if-they-thought-it-was-basic-as-it's-too-easily-missed-if-it's-not-spelled-out", mean that the book is not really "complete", and not as suitable as it should be for relative beginners — especially for Java and (in whatever language) Windows. It wouldn't take much extra work to get it up to scratch on that front. Perhaps the next edition, or better still an online update/supplement?
For the more experienced, the book doesn't take readers to as advanced a stage as it could have, in my view. In particular, it would have been good to have more info and example code on partitioning data between different domains, and also how to migrate data from an existing database to SimpleDB — their code for "importing" the sample database literally just adds each item and attribute individually.
Fix the errors, add the missing info for beginners, provide downloads of code in all relevant languages and "fake files", and I'd have given it a 7. Provide working sample Java code with more explanation, plus proper integration with S3, an 8. Add fuller info on partitioning, migration, and perhaps even integration with yet more AWS services, a 9.
All opinions are personal to me: half geek, half lawyer, mostly harmless. I'm researching legal issues in cloud computing.
You can purchase Amazon SimpleDB Developer Guide from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.