1. Implementing column pagination in Cassandra

    Implementing pagination in a database like MySQL is trivial, you just add LIMIT with the offset and page size and you’re done. Doing it in Cassandra isn’t that much harder, but it took me a little bit of thinking getting it right so I thought I’d make it googleable.

    Note: I’ve had this post as a draft for a while, never getting around to finishing it and properly testing the code, and then yesterday I saw Pagination with Cassandra and what we can learn from it retweeted by a few people. It covers the same ground, but if I’m not mistaken it lacks one thing: the implementation can’t tell if a page is the first when going backwards. Because of this I though that perhaps this post still adds something, and it got me to finally put in the effort to finish it.

    What I wanted to page over was a row with up to hundreds of columns, showing a list of 20 on each page, with a “next” and “previous” link below. You know the pattern.

    Loading the first page is of course trivial, just set the max number of columns you want to retreive. This is how you would do it using Eurydice, a JRuby driver that I’ve built on top of Pelops:

    page_items = column_family.get(row_key, :max_column_count => page_size)
    

    If the call is successful the page_size variable will contain a hash of column names to column values.

    The code above doesn’t get us very far though, but the code for loading the second page isn’t so difficult to figure out either. However, before going that far there’s one thing we need to do first: making sure there even is a second page.

    One way of doing that could be to count the number of columns in the row, and making sure that that number is higher than the offset + page size. There is a simpler way: just load one more item than you page size, if you get it you know there is another page, and if you only get a number equal or less than your page size you know this is the last page:

    page_items = column_family.get(row_key, :max_column_count => page_size + 1)
    if page_items.size > page_size
      next_offset = page_items.keys.last
      page_items.delete(next_offset)
    end
    

    Remember to pop off that extra column as I do in the if block. There’s only two hard problems in computer science: naming, concurrency and off-by-one errors.

    Loading the second page is where things start to differ between how you would do it in a relational database and how you have to do it in Cassandra. There is no way to give it an numerical offset where to start reading. Instead you have to tell it from which column to start reading.

    Turns out that loading one extra column every time serves a double purpose. It tells us that there is a next page, but the key of that last column is also the offset of the next page. Use that as parameter to the “next” link (I’m calling the parameter offset here, and I assume your request parameters are in a hash called params):

    options = {:max_column_count => page_size + 1}
    options[:from_column] = params[:offset] if params[:offset]
    page_items = column_family.get(row_key, options)
    if page_items.size > page_size
      next_offset = page_items.keys.last
      page_items.delete(next_offset)
    end
    

    If you want to avoid handling the special case of the first page where :from_column isn’t needed, you can make the offset variable default to \0 (NULL). It sorts before everything.

    offset = params[:offset] || "\0"
    options = {:from_column => offset, :max_column_count => page_size + 1}
    page_items = column_family.get(row_key, options)
    if page_items.size > page_size
      next_offset = page_items.keys.last
      page_items.delete(next_offset)
    end
    

    So far it’s pretty easy. With the code above you can start on the first page, clicking “next” again and again until you get to the last page where the “next” link can be hidden or disabled because the next_offset variable will be nil.

    Now comes the step I had to think a bit about before I got the hang of. How do we go backwards?

    We can’t go backwards with the same code, because we don’t know the offset of the previous page. There’s no way to calculate that from the contents of the current page.

    There is however another option you can pass to get: :reversed => true. It will do the same query, but go backwards. This means you can load the previous page by starting at the first item of the current page and go backwards. The result you get will also be in reverse order, so you need to reverse it again to get the it the right way around:

    offset = params[:offset] || "\0"
    direction = params[:direction]
    options = {:from_column => offset, :max_column_count => page_size + 1}
    options[:reversed] = direction == 'back'
    page_items = column_family.get(row_key, options)
    if direction == 'back'
      page_items = Hash[page_items.keys.reverse.map { |k| [k, page_items[k]] }]
    end
    previous_offset = page_items.keys.first
    if page_items.size > page_size
      next_offset = page_items.keys.last
      page_items.delete(next_offset)
    end
    

    The code above contains a one-liner that reverses the page_items hash — Ruby (1.9+) hashes are ordered, but lack a method to reverse. Depending on your use case you may want to write it differently.

    I also added the previous_offset variable, which can be used as the offset-parameter in the “previous” link. At this point it will always be set, even when on the first page, and that is obviously wrong. There is one more piece needed to wrap everything up:

    When we load the previous page page_size + 1 makes sure we get the full previous page plus the first item on the current, but it doesn’t tell us if there are more items beyond the previous page. To do that we need to apply the same trick again: load yet another item:

    offset = params[:offset] || "\0"
    direction = params[:direction]
    options = {:from_column => offset, :max_column_count => page_size + 2}
    options[:reversed] = direction == 'back'
    page_items = column_family.get(row_key, options)
    if direction == 'back'
      page_items = Hash[page_items.keys.reverse.map { |k| [k, page_items[k]] }]
      if page_items.size == page_size + 2
        page_items.delete(page_items.keys.first)
        previous_offset = page_items.keys.first
      end
    else
      if page_items.size == page_size + 2
        page_items.delete(page_items.keys.last)
      end
      previous_offset = page_items.keys.first if params[:offset].nil?
    end
    if page_items.size > page_size || previous_offset.nil?
      next_offset = page_items.keys.last
      page_items.delete(next_offset)
    end
    

    Instead of page_size + 1 we load page_size + 2, and then, depending on whether we’re going forwards or backwards we have to handle the extra item differently. When going forwards we just pop off both the extra items, but when going backwards one of the extra items will be at the start and one at the end. The one at the start will only be present if there is a previous page, and we can use this fact to conditionally set the previous_offset variable. When set there is a previous page and you can show a “previous” link, just as the next_offset tells you that there is a next page.

    The row may have been modified after the paging started, and this can cause some problems when going back to the first page. To handle this gracefully you need to make sure that if you’re on the first page (you can tell by previous_offset being nil) the last item must always be popped off. There is an edge case here where you can end up with an empty page if things are removed from the first page when you’re on the second, and then go backward, and I haven’t come up with a workaround, you will get an empty page with just a “next” link.

    Using the code above you can implement paging in Cassandra. You can have links for showing the next and previous pages, and disable these when there isn’t a next or previous page. What you can’t do, and what would be very much harder, is to add a feature to jump to a specific page. Without numeric offsets that would require lots and lots of code, unfortunately.

    Here’s an illustration of how the method works when the page size is 3:

    Illustration of how the pagination works