Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Image

Book Review: Amazon SimpleDB Developer Guide 38

Posted by samzenpus
from the read-all-about-it dept.
KuanH writes "Amazon SimpleDB Developer Guide is billed as a complete guide to using Amazon's SimpleDB database API. It's most detailed for PHP. It's helpful for Python. But the Java code and explanations aren't up to the standard of the others. It includes a primer on using Amazon S3 with SimpleDB: files stored on S3, file metadata stored in SimpleDB — again, less good for Java. It also covers tuning to reduce usage costs, caching using memcached, and ways to batch-update and make serial or parallel requests to SimpleDB. However, it's missing some information that beginners might need, and it's perhaps not quite advanced enough for the more experienced. Downloadable example code is available only for PHP." Keep reading for the rest of Kuan's review.
Amazon SimpleDB Developer Guide
author Prabhakar Chaganti, Rich Helms
pages 252
publisher Packt Publishing
rating 6
reviewer Kuan Hon
ISBN 1847197345
summary "Getting started" guide to using Amazon's SimpleDB cloud database
Say "cloud" to get the attention of CIOs seeking to cut costs in these recessionary times. One well known "database in the cloud" option is Amazon Web Services' SimpleDB, which Amazon describes as "a highly available, flexible, and scalable non-relational data store that offloads the work of database administration."

Those who prefer traditional relational databases could try eg Amazon RDS. This book only covers SimpleDB, a NoSQL or non-relational database. As is well known, NoSQL databases grew in popularity with the growth of large distributed systems and cloud computing, and their proponents tout their scalability and speed.

For anyone wanting a quick primer on NoSQL databases, this book includes a chapter on NoSQL which isn't limited to SimpleDB. It outlines some key conceptual differences between NoSQL and relational database management systems, with pros and cons, using the analogy of "a spreadsheet with some XML characteristics", and illustrating with some concrete examples. That chapter's been made available as a free sample chapter (SimpleDB versus RDBMS), so you can get a flavour of the book.

The contents list for this book is online, I won't recite it here. As well as an overview of SimpleDB, its terminology and advantages, the book goes through signing up with AWS and SimpleDB, and the account access keys. That chapter is also online, as a tutorial.

You may ask, how does this book differ from Amazon's free SimpleDB documentation, which includes a developer guide and a "getting started" guide? Amazon's own "getting started" is certainly helpful, and it's worth downloading and trying their web app scratchpad. But Amazon's detailed developer guide concentrates on REST and SOAP requests, which most people wouldn't want to deal with direct at that low level.

This book's focus is on using the SimpleDB web services API through certain specific languages and libraries — namely Java (JDK6 — using the typical 1.6 library plus several dependencies), Python (2.5 — you need boto), and PHP (with curl). It recommends the SDBtool Firefox extension (SDBizo), which is excellent for checking the results of running the code.

I've tried the book's Java and Python examples, on Windows. Not PHP, as I've not got round to learning PHP yet, though I skimmed the PHP explanations. Similarly, I've not had time to try it all over again on Linux. Generally, the book's coverage seems fuller and better for PHP than for Java or Python. Perhaps it was originally written for PHP, and the rest was bolted on — the stuff for Java more hurriedly than for Python?

The downloadable code samples, as mentioned, are PHP only. They really should have provided downloadable code for all 3 languages, plus some fake MP3 files (see later). If you get the e-book (available in PDF and epub), you can copy and paste the Java or Python code. But that's a tad tedious, especially when the code runs onto a new page, and there are stray end of lines etc that you have to delete manually. Furthermore, the Python code provided is for the interpreter in interactive mode (not for .py files, except a couple towards the end). So, for the Python, you also have to copy/paste each line one at a time. But that still beats having to re-type pages of code in full.

In other words, if you want this book and you're only interested in PHP, you can get away with just buying the hard copy and downloading the code from the Packt site. But if you prefer Python or Java, to save your fingers and blood pressure you should buy just the e-version, or get both paper and e books together. I really hope Packt will in future provide downloadable code samples for all the languages covered.

I have more issues with the sample code given in this book. The typical imports should have been spelled out in the example Java code. Eclipse offers more than one possible import in some cases. It was "try everything till it works", at least until I found this tutorial. I've included the initial required typical imports (though not the standard java.util etc ones) in my own list of points, which I'll say more about at the end of this review. Surely it wouldn't have been difficult to include just those few lines of imports, which could have saved readers a lot of time trying to work out the correct imports. There are also errors in the Python code, and on one page the code that should have been included is missing altogether.

Now, more on the book proper. After the overview described above, this book walks you through the basic SimpleDB operations: how to create a SimpleDB "domain" (equivalent to a worksheet in a spreadsheet), list domains, create/retrieve items (like spreadsheet rows), and delete domains.

Items have attributes (spreadsheet column headings), as key:value pairs — the key is the attribute name, the value is its value, eg address:1 Acacia Avenue. An attribute can have more than one value, eg the same item can have both address:1 Acacia Avenue and address:2 Broadway. The book also lists the SimpleDB constraints on domains, items and attributes — maximum number or size, etc — but it's best to check the AWS site for the latest info.

Code examples are given for each of the 3 languages mentioned. The examples are similar, but don't always cover the same ground. If they'd done that, where possible, it would have been more helpful to those of us trying examples in more than one language. One advantage of a book with associated website is that electronic updates can be published, and it would have been great if that had been done for this book. For instance, the book gave conditional put/delete code examples only for PHP. At the date of this review, boto now supports those features, but sample supplemental Python code for that still hadn't been made available.

SimpleDB stores attribute values as UTF-8 strings. This means that comparisons for sorting or searching are done lexicographically (character by character, left to right, numbers take precedence over uppercase over lowercase), and to handle numbers or dates you have to encode and decode them yourself. So, the book has a chapter explaining lexicographical comparison, data types, and how to encode and decode data to enable proper sorting and comparison of numbers, dates, Boolean values and XML-restricted characters. In the case of numbers this means zero padding and offsets, and there's example code for decoding and encoding numbers. Unlike with PHP and Python, oddly the Java code given was for the body of the typical method that carries out the encoding etc. This could have been omitted, and they should have given example code illustrating the method's usage instead. Similarly for the date formats code.

The SimpleDB query syntax is generally covered well, in a chapter which takes readers through first creating a sample database of song metadata to run queries against. It's not too painful copy/pasting the Java code (3+ pages), but with Python in interactive mode I drew the line at creating every song item and attributes using individual statements, even with pasting, so I just tried adding a couple of random ones to test that the code worked. I say again, full downloadable code please...!

That chapter then gives helpful examples of queries against the sample database and their results, including for more complex combined queries ("and", "or" type queries, "not" etc), and querying for multiple-value attributes. It also provides code examples for sorting and counting query results. But the Java code for retrieving an item's attributes wouldn't run, and I couldn't find the method used (getItemsAttributes()) detailed in the typical documentation; perhaps the book is out of date here?

The book starts going beyond the basics from Chapter 7 onwards, with a chapter on Amazon's S3 storage service — another well known component of Amazon Web Services, where "objects" (files) may be stored in "buckets" (directories), with "keys" used to retrieve objects.

For S3, the book uses JetS3t for Java. However, the Java code given for uploading files to S3 didn't demonstrate any integration with SimpleDB at all — the files were just uploaded with their filenames as the S3 keys, and the code didn't seem to deal with the creation of your own custom S3 keys for uploaded objects. In contrast, the Python code generated the S3 keys for the files from hashes previously produced and stored in the SimpleDB database, as well as dealing with their uploading. In addition, for me the Java code for downloading files from S3 just wouldn't run, and also it wasn't clear where the files were supposed to be downloaded to locally, unlike with the Python example. Inexplicably, there was no info on how to delete objects from S3 buckets, or indeed how to delete buckets. So, while the S3 chapter is of help, it could definitely do with being expanded, especially the Java sections.

Next, money money money. AWS charges are based on usage, so the chapter on tuning and usage costs has some practical value in explaining how SimpleDB is charged for, the "BoxUsage" value returned by requests to SimpleDB, using BoxUsage to optimize queries and compute costs, and how to get BoxUsage values back with your queries using Java, Python etc. There are code examples that, when run, illustrate the different BoxUsage values you get when you use different operators or expressions in queries (eg, using LIKE costs more).

However, partitioning your data into multiple domains is covered in only a few paragraphs, with no code given. I'd have liked to see more info on that, and some sample code for the partitioning process.

To further save money, you can use a cache to store data locally, trying your local cache first; and, only if the data is not there, would your app go out to SimpleDB and incur costs for querying it. This book accordingly has a chapter on how to install and use the popular open source caching system memcached to cache your query results locally. (CacheLite for PHP is also covered.) Again, the Java sections caused me some frustration. The Java test code showed that the memcached server was running properly on my machine, but the Java code for using the cache just didn't work; it ran, but continued to query SimpleDB direct. The Python code, however, worked perfectly — except that, if you're using memcached in Windows, you'll need to use port 11211 instead of what's shown in the book. (I didn't try it in Linux.)

Finally, the book deals with running parallel operations against SimpleDB, using its BatchPutAttributes. The section on updating SimpleDB in Python by making serial consecutive calls to SimpleDB is completely missing the code for the script, but the book does then cover inserting multiple items concurrently into SimpleDB using a threadpool in Java. It also gives sample Python code for alternative ways of parallelising requests: using Python's built-in threading module, threading and queues combined, then threading using the open source workerpool module.

To conclude, in substance the book has a fair amount of useful information on the basics of getting started with SimpleDB, particularly for Python (and probably PHP). But not providing downloadable code samples in Java and Python, or "fake" MP3 files to try S3 uploading/downloading, is a minus.

Some errors, inconsistencies and missing information from the department of "I-wish-they'd-included-this-even-if-they-thought-it-was-basic-as-it's-too-easily-missed-if-it's-not-spelled-out", mean that the book is not really "complete", and not as suitable as it should be for relative beginners — especially for Java and (in whatever language) Windows. It wouldn't take much extra work to get it up to scratch on that front. Perhaps the next edition, or better still an online update/supplement?

For the more experienced, the book doesn't take readers to as advanced a stage as it could have, in my view. In particular, it would have been good to have more info and example code on partitioning data between different domains, and also how to migrate data from an existing database to SimpleDB — their code for "importing" the sample database literally just adds each item and attribute individually.

Fix the errors, add the missing info for beginners, provide downloads of code in all relevant languages and "fake files", and I'd have given it a 7. Provide working sample Java code with more explanation, plus proper integration with S3, an 8. Add fuller info on partitioning, migration, and perhaps even integration with yet more AWS services, a 9.

All opinions are personal to me: half geek, half lawyer, mostly harmless. I'm researching legal issues in cloud computing.

You can purchase Amazon SimpleDB Developer Guide from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

*

This discussion has been archived. No new comments can be posted.

Book Review: Amazon SimpleDB Developer Guide

Comments Filter:
  • What Do I Do When The Cloud Disappears?

    • by x*yy*x (2058140)
      Yeah, lets start this stupid argument about clouds again. Slashdot used to have knowledgeable persons commenting on stuff, not some idiots making remarks about things they know nothing about. Sigh.
      • Yes, I know nothing about cloud-based services, having only designed and authored four of them. It's called a joke, ass.

        • You should stick to designing and authoring cloud-based services.

        • by x*yy*x (2058140)
          It's just the fact that now a days certain slashdot stories always have the same bullshit on them.

          Cloud services ->
          "Useless"
          "It's just a bunch of dedicated servers!"
          "Stupid buzzwords. And fuck you, CEO, for telling me where you want your company to host our stuff"

          Piracy ->
          "Piracy is looting on the seas, idiots!"
          "Information wants to be free!!"
          "This might have DRM and by principle, even if it causes no trouble, I feel obligated to steal the game"
          "But copying is not stealing!!"

          3D glasse
          • by vlm (69642)

            Ebooks ->

            "If I look at a LCD display for more than 5 minutes, my eyeballs explode, unless its /. or LCD TV showing a nine hour football game or a LCD owned by my corporate overlord. But read a book, and poof the eyes are gone"
            "I only read in the bathtub; while I'm taking a bath; why yes, people who are not women and take baths instead of showers do exist, or at least, Descartes said I bathe therefore I am"
            "I only read at the beach and I don't want to scratch up a reader"
            "It'll be obsolete some day, and

          • Dude you missed the best ones:

            SSD ->
            "I heard these drives die really quickly because of a limited number of writes"

            Anything remotely to do with the FCC, Radio waves, television, or pretty much anything with an antenna ->
            Commodore_64_love or sockpuppet thereof: here's all the OTA TV channels I get!

          • by Nimloth (704789)
            Somehow you managed to miss all the Slashdot groupthink clichés. Not that you don't have a point, but you missed the good ones. Your post is moot and boring, just like mine. Make room for more groupthink.
      • Yeah, lets start this stupid argument about clouds again. Slashdot used to have knowledgeable persons commenting on stuff, not some idiots making remarks about things they know nothing about. Sigh.

        He may have meant it as a joke, but the truth is no laughing matter. I use AWS extensively, so I know what I am talking about.

        Amazon provides no way to do the following in SimpleDB:

        1. Export a consistent copy of all of your data.
        2. Import all of your data.
        3. Backup your data.
        4. Snapshot your data.
        5. Point in time recovery of your data (i.e. I want to keep all of the data except the result of my update statement that lacked a where clause).

        It's a shame, really, because SimpleDB is a great datastore for many use cases. But

  • The cloud is a big thing these days. Cloud this, cloud that, it's almost like we're all in a fog trying to wrap our minds around how to make use of it.

    But as with many new technologies, its own hype precedes it. Sure, you can put everything you own data-wise into a cloud, but should you is the question we need to ask ourselves. And you know, it's not even that the answer is "no" as you might be expecting from the way that question was phrased.

    As the name suggests, clouds are mainly good for carrying very

    • by x*yy*x (2058140)
      Why can't games be hosted with the cloud services? I know at least Zynga partly uses Amazon's hosting. Minecraft and a lot of other games use Amazon's CDN for delivering updates and game files.

      If I were running a web based online game, or something that interacts with other players even if the client is local, I would seriously consider cloud hosting. Microsoft's Azure actually has great deal of things integrated, so if you were making a game client you could use the azure hosting directly. Yeah, you're d
      • by x*yy*x (2058140)
        To give a better example what I mean by integrated programming. Normally you'd be making a client-server model and everything that comes with - maybe use of different programming language for server, input handling and data exchanging with error processing and data validation, all the backend stuff threaded stuff so that the server can handle lots of clients and is capable of handling lag etc..

        Now, for example, Azure allows you to code those parts directly in your project. If you were making your game wit
      • by vlm (69642)

        Why can't games be hosted with the cloud services?

        http://www.lacunaexpanse.com/ [lacunaexpanse.com]

        Lacuna Expanse (which is free, and is pretty cool) uses cloudfront. Works pretty darn well. Every time I shop at amazon and its glacially slow (always?) I wonder if its because amazon makes more profit off hosting... Its a hosting company that also sells books and stuff.

    • The cloud is a big thing these days. Cloud this, cloud that, it's almost like we're all in a fog trying to wrap our minds around how to make use of it.

      And, of course, Amazon managed to punch a hole in that last week with a 4-day outage to part of their "cloud."

      The latest being how they're weaseling out of their 99.95% service level agreement [theregister.co.uk].

      So, here are a new set of nouns/adjectives/phrases to describe "Cloud" and "Cloud Computing":
      Worthless
      Downtime
      Read the fine print
      Outage
      Data under someone else's control

  • The "cloud" is here, and those in higher up positions want "us" (devs) to use it just because. But of course, it should be used only when it actually benefits the situation.

    So I guess that leads me to the next question: what are the REAL situations which would actually benefit from using the cloud? I don't really know of the "real" cloud use cases, so maybe someone can help me out here..

    • Easy... You use "the cloud" for speed: When you need to turn an idea into a product in a *very* short amount of time and you have no other infrastructure to leverage. A lot of businesses have IT staff and have already sunk costs into the needed infrastructure. The sales pitch is that "the cloud" is better than owning infrastructure. And that's where the real arguments start.
      • by al0ha (1262684)
        >> "the cloud" is better than owning infrastructure

        Or as in my case, paying for it. I set up a Micro instance which suits me perfectly and now my hosting is FREE for a year, and in fact far better than the $11/month VPS service I was using.

        Have yet to determine estimated costs once the instance is no longer free, however I'd guess far less than even $11/month - and for far greater resource allocation in terms of available memory and CPU.
        • That looks more like a business decision to me: paying one hosting provider vs another. I'd also question how much data is owned by your service versus your customers. If this is a "hobby", as in you're not doing anything that you might leverage with the data, then it doesn't matter one way or the other. You simply choose the least expensive hosting operation that gives you the most services. If you do need to leverage the data, I'd at least build a periodic off-siting method to own the data. That way, when
    • Heres a case: a friend of mine develops corporate intranets. He has a large client that he has been working with for nearly 10 years now and their intranet is truly huge and mission critical. Accordingly, they had their own collocated rack at a local ISP and a small IT staff to maintain it (on top of the dev team that works on the intranet itself). Last year, some key IT people announced that they would be leaving and the business took a little financial hit.

      So they had the idea of moving the system wholesa

    • Here's an example. I just released Bullhive [bullhive.com]. It uses cloud-based computers to provide on-demand CPU cycles for financial modeling. As a startup, we COULD have gone out and spent all our lunch money on a huge rack of computers, and spent a bunch more time configuring and managing them instead of developing our core product. Now, that's all taken care of. If/when we grow, we can look at the economics of running our own servers for base load vs. using cloud computing.

      So, in our case, the main advantage
      • Makes a lot of sense to me. You answered essentially the same as I answered below: "The cloud" makes sense when you have no other infrastructure to leverage (or do not want to buy any). With growth, it becomes an interesting question: When do you move away from "the cloud"? I'd say it depends on your business model. If what you are selling is CPU-cycles and you only own the billing data, while you might never move your product out of the cloud, you just might own your billing platform and build your own gat
  • Isn't that an Oxymoron?
    • by vlm (69642)

      Isn't that an Oxymoron?

      Naw, "non-packt publishing slashdot book review" now that's an oxymoron.

      Although there was a review of Knuth's last book about 30 books ago, and there was a non packt book just a week or two ago.

  • ... why I remember when were content to call it VSAM ;)

    Meanwhile, color me surprised (that would be a pale shade of mauve, for those interested) -- a review of a Packt book that isn't an 8.9, or 10 ad actually has something critical to say. Maybe this one was by a real person?

    . Or maybe they're smartening up and realizing lukewarm press still gets the product name out there, while at the same time deflecting "shill" accusations.

If it happens once, it's a bug. If it happens twice, it's a feature. If it happens more than twice, it's a design philosophy.

Working...