Book Review: Getting Started with LevelDB

January 30, 2014 - ios leveldb

Getting Started with LevelDB

I was asked to review the new LevelDB book from Packt (disclaimer: I know the author and was provided a free electronic copy). I’ve done a reasonable amount of experimental hacking using LevelDB on iOS in the past,so I was keen to see whether there was much I’d learn from a ‘Getting Started’ book.

Firstly, a bit about my background in this area. Thanks to my abiding hatred of all things Core Data (remind me to do a blog post on that later), I’d looked into LevelDB a little while ago as a possible alternative iOS persistence technology. I’ve used NoSQL (document) databases on web projects in the past, and I was looking for something that gives the performance, simplicity and syncability of something like CouchDB or RavenDB on iOS (before you comment, yes, I know about TouchDB and CouchBase Mobile). I found LevelDB to be a little too low-level to offer a complete persistence solution out of the box, and partially managed to avoid the temptation to go off and build one (another blog post). I knew Andy had a background in ISAM and embedded storage engines, not to mention that horrid C++ language, so I was hopeful he’d have some better approaches to real-work usage.

I also should mention there was a mild controversy over the book when it was released - the title implies generic LevelDB content, whereas the book focuses on practical implementation in OS X and iOS applications, which some people felt was misleading. I didn’t have a problem with it, but then I’m only really interested in using it in OS X & iOS applications.

The ‘building and installing’ chapters are quite detailed, and explicitly show things going wrong, provide an explanation why, and show how to rectify it. I found them a little bit tedious, but people inexperienced with building open-source C/C++ projects would probably need the extra hand-holding (to be honest, I ignored the makefile & built LevelDB in Xcode - kudos to the developers for not making the build too exotic).

Andy then goes on to examine the C++ API and explain the query semantics and some of the core idioms and public data structures. This lays important conceptual foundations, but given I was already familiar with these I skimmed over most of it. It’s worth noting the code in these chapters is all defiantly C++ (e.g. cout rather than NSLog etc) - I can understand why this was done, given the disappointing and inconsistent state of Objective-C API options (more on that in a moment), but it doesn’t make it any less jarring.

Finally we hit the Objective-C code and some of the OS X/iOS samples. I had a bit of a chuckle at the line:

Some people also have a strong aversion to C++ and will avoid anything that lacks an Objective-C interface.

(I suspect he’s probably talking about me). Complicating this task is the fact there’s no official or standard Objective-C API for LevelDB, so the book spends a bit of time covering the three most popular open source wrapper interfaces, and duplicates some of the sample code across the three. Eventually Andy settles on a customised version of APLevelDB for the remainder of the book (though I’m not sold on exposing the raw C++ DB reference), however ideally we’d instead have available a popular, stable, complete, well-maintained, idiomatic Cocoa LevelDB wrapper. This could have made the book much simpler and better, and allowed it to spend more time and energy discussing other topics. Hopefully someone who’s not me will write this someday.

Next we receive a walkthrough implementing a basic sample app - the book describes an OS X application, but it's worth noting all the downloadable code samples include the iOS equivalent. There are also descriptions of some debugging tools, including dump, lev, and implementing a REPL on an iOS device via an embedded web server (quite a clever trick, though I’m not sure if I’ll ever use it). The sample app is then extended with more advanced functionality, and this is where (for me) things really started to get interesting. This included secondary indexes, key design considerations, custom comparators, record-splitting, and ‘schema support’ extensions to assist in maintaining keys. The last was quite a nifty idea - I can see the potential for a declarative (e.g. json file in the app bundle) index definition mechanism. Less code is always better™.

Chapter 9 was titled “A Document Database”, and I hoped it was something closer to what I've used in CouchDB. It wasn’t really - the sample was more geared towards a metadata database of external files. However, it does include a basic text indexing implementation, which is an area I’ve battled with before, so it’s given me enough to take this idea further. The links to test indexing algorithms and open source implementations are also invaluable - additional resources are extensively referenced throughout the book; it's one of the things that’s been done very well.

Chapter 10, “Tuning and Key Policies” is gold, and probably worth the sticker price of the book on its own if you’re serious about implementing a LevelDB solution. It starts out describing LevelDB under the covers - memtables, SSTs, the eponymous levels, snapshots and Bloom filters. The various performance settings are discussed and recommendations made, then Andy gets into a discussion of how to structure data and keys to optimise performance, much of which revolves around understanding when & how often your data is read and updated. He also discusses some optimisations made by Basho in the Riak codebase.

Lastly, the appendix covers using LevelDB from three scripting languages (Ruby, Python & JavaScript (node)), which was worthy of inclusion given I’ve already come across the need to script data in & out of a database.

In summary, I think the book would be invaluable to anyone looking at using LevelDB from Objective-C, and most of it would still be very useful to developers on other platforms. It’s certainly given me a lot to mull over, and rekindled some of my excitement about LevelDB on mobile devices.