Playing with CouchDB View Collations

I've been playing with CouchDB quite a lot lately. While it's still far from feature-complete, I think it's a cool project. Generally the concept of object databases is a new thing to me, growing up in the web where for most people "database" == "Relational Database".

Since CouchDB related documentation is still scarce, I decided to share a cool thing I managed to pull around today with CouchDB's view API.

For the purpose of playing with CouchDB, I decided to write a PHP-implemented CouchDB interface (yes, I know there are exiting ones) I currently dub "Sopha" and a Wiki using CouchDB and Sopha as backend (if I ever get those to a stage where they would actually work, I will probably release them as open source).

One of the design goals of my Wiki, was that previous revisions of each page are kept and can be accessed. While CouchDB keeps track of object revisions, access to old revisions is not guarantees as those eventually get deleted - So I had to implement my own revision tracking of pages. The design I went for was simple: each revision in it's own document in the DB. They all have the same "title" field - but an incrementing revision number, starting from 1 (for new pages) and going up.

This design poses one major problem: how do you easily access the single most recent revision?

I started by creating 3 different view functions for accessing wiki pages: one to access the most recent revision of a page by it's title, one to access a specific revision of a page by title and revision number, and another one - to fetch all the revisions of a specific page by it's title.

This was a mistake, and it resulted from my years of thinking relational (keys, indexes, SQL, etc.). As I worked my way through, I started understanding the power and magic of CouchDB's views and view collations. With proper collation use, it turns out it only takes one view function to rule them all:

CODE:
  1. function(doc) {
  2.   if (doc.doctype == "WikiPage") {
  3.     map([doc.title, doc.revision], doc);
  4.   }
  5. }

Using this view function, I could do the following ("titlerev" is the name of my view function, URLs are decoded for readability):

Access the specific revision 123 of a specific page titled "My Page":
GET /database/_view/wikipage/titlerev?key=["My Page", 123]

Access all the revisions of "My Page":
GET /database/_view/wikipage/titlerev?startkey=["My Page"]&endkey=["My Page", "Z"]
This will work because CouchDB orders the results according to their key - and has strictly defined ordering rules - so ["My Page"] will always come before ["My Page", 1] and ["My Page", "Z"] will always be after ["My Page", n] as long as n is a number.

Access the latest revision of "My Page":
GET /database/_view/wikipage/titlerev?startkey=["My Page", "Z"]&endkey=["My Page", 0]&decending=true&count=1
This works as well - because records are selected in reverse order from ["My Page", "Z"] to ["My Page", 0] - but only the first one is returned - so it will always be the one with highest revision number.

How cool is that?

BTW there is a good article which also explains CouchDB view collations at cmlenz's (one of the contributors to CouchDB) blog.