1. A new Ruby driver for Cassandra

    For the last months I’ve been working on a driver for Cassandra’s new native protocol, and today I released version 1.0.

    Before 1.2 all Cassandra drivers used Thrift, but in 1.2 a new binary protocol was introduced. The Ruby Thrift libraries have never worked well, especially not in JRuby, and because of that using Cassandra with your Ruby application has never been a great experience. Hopefully that will change now.

    cql-rb is pure Ruby, works with both MRI and JRuby and has no other dependencies. It has support for most of the features of CQL3 and Cassandra 1.2, including prepared statements and authentication. It’s built around a non-blocking IO reactor to make full use of the binary protocol’s support for request pipelining, and maximize performance.

    For version 1.0 the focus has been to support the features needed for most applications, to be stable enough for production and to provide good performance. Now when 1.0 is out I will try to implement some of the more advanced features that exist in Datastax Java driver, for example automatic peer discovery, reconnection on node failures, support for tracing requests and compression.

    Finally, here’s a snipplet to demonstrate basic usage, there are many more examples in the README, and in the API docs.

    require 'cql'
    
    client = Cql::Client.connect
    client.use('system')
    result = client.execute('SELECT * FROM peers')
    result.each do |row|
      p row
    end
    

  2. I gave a presentation about the architecture behind Burt at Spotify HQ a couple of weeks ago, here’s the video.

  3. Learning to Build Distributed Systems the Hard Way

    Presentation held at DeNormalised London 2012

    The presentation was recorded, here’s the video.

  4. Concurrency and Distributed Systems using JRuby

    Presentation held at JRubyConf EU in Berlin, August 2012.

  5. The Ruby standard library is a disgrace

    I had a really bad experience of the Ruby standard library today. I sat down with a colleague to look at the performance of a small piece of code that happened to be in a hot spot of our code base. We set up a simple benchmark, saw that it was less than ideal performance, reasoned a bit about the code and decided that perhaps it would be better to replace the use of SortedSet with just #sort! for this particular case. So just to be sure we ran perftools.rb to see what was going on. This is what we saw:

    SortedSet isn't

    Right. Perhaps #sort! wouldn’t be such a good idea after all. And what is SortedSet doing? Why is is even calling #sort!? How is it a sorted set if it needs to call #sort!?

    Turns out that SortedSet isn’t.

    The first the class does is to try to load rbtree. Yes, the standard library class SortedSet is implemented in terms of the gem rbtree. If it can’t load rbtree it creates (via module_eval and a huge string containing the code, of course) a completely retarded implementation instead.

    What. The. Fuck.

    The Ruby standard library has gotten some (rightful!) flak over the years, but for me this really takes the biscuit. The Ruby standard library is a disgrace. The Ruby community should have thrown it out a long time ago. It’s full of awful code — some of the worst Ruby code I have ever seen is in the standard library. Many of the APIs are horrible, and much of it was written almost as to be as slow as possible.

    I think it puts the Ruby community in a bad light. We should be ashamed of it.

    If you’ve somehow escaped noticing the issues with the standard library, here’s a smalll gallery of horrors.

    Net::HTTP

    An API so byzantine that it makes SOAP look simple. Have you ever written code that used Net::HTTP without looking at the documentation first?

    That someone wrote open-uri to hide the ugliness for most use cases is one of the few redeeming qualities of the standard library.

    There’s tons of good HTTP libraries available as gems, thankfully.

    URI

    Until recently the code that parsed query strings had an exponential backtrack bug caused by this regex

    /\A(?:%\h\h|[^%#=;&]+)*=(?:%\h\h|[^%#=;&]+)*(?:[;&](?:%\h\h|[^%#=;&]+)*=(?:%\h\h|[^%#=;&]+)*)*\z/o
    

    now it’s changed to

    /\A(?:[^%#=;&]*(?:%\h\h[^%#=;&]*)*)=(?:[^%#=;&]*(?:%\h\h[^%#=;&]*)*)(?:[;&](?:[^%#=;&]*(?:%\h\h[^%#=;&]*)*)=(?:[^%#=;&]*(?:%\h\h[^%#=;&]*)*))*\z/o
    

    so that’s good, I guess.

    Date

    Don’t get me started on Date. It’s been updated in 1.9.3 it seems, but prior to that the performance was just atrocious, and the code was even worse. date/format.rb was completely indecipherable before, but now it’s been ported one-way encrypted to C.

    The bottom of the barrel

    Then there’s the rest. The things that have ended up in the standard library for historical reasons. For some of it I guess it felt like the right thing to include at the time, but other things are just weird. Why is there a prime number finder in the standard library? Why two option parsing libraries? Why tempfile, but tmpdir?

    There’s been suggestions of moving it out of the distribution and into a default gem or gems. Hopefully we’ll see that in Ruby 2.0. It wouldn’t make the code any better but it would at least be a small step in the right direction.

    I think it’s a pity that good libraries like Rake were moved in to the standard library, it feels like the opposite of what we want. The slow release cycle makes it completely pointless trying to contribute to making it better. It’s so much easier to write a gem, or send a pull request to an existing gem, push it to RubyGems. The standard library is where gems go to die.

    To contribute, and so that you can’t say I’m not doing my part to making things better, I’ve created a gem called Sanity that does one very simple thing: it removes the standard library from the load path. Just require "sanity" and you will no longer be exposed to the insanity of Net::HTTP, Date, URI and friends.

  6. “The incompetency is staggering”

    http://twitter.com/#!/crucially/status/201008727212560384

    I couldn’t agree more. The post from Server Density does not go into the details of the Memcached deployment, but this description makes me think that they, indeed, are doing it wrong:

    This eliminates the need for Memcached itself running on a separate cluster.

    Separate cluster? It does indeed sound like Memcached-over-internet.

    But why where they using Memcached in the first place? The answer is both funny and tragic:

    the performance impact of the global lock in MongoDB 1.8 was such that we couldn’t insert our monitoring postback data directly into MongoDB – it had to be inserted into Memcached first then throttled into MongoDB via a few processor daemons

    So, besides using MongoDB as a message queue they stick Memcached in front of it, using that too as a message queue, because MongoDB is too slow? I think this calls for:

    Double facepalm

  7. A Guide to the Post Relational Revolution

    Presentation held at Scandinavian Developer conference, April 17th, 2012.

  8. Monitoring service amateur hour

    You would be excused for thinking that people who do monitoring software for a living would know how to do a datacenter migration, but then maybe the Boxed Ice guys running Server Density are just amateurs.

    Twitter conversation https://twitter.com/#!/serverdensity/status/190436930071183360

    Oh, right, I totally forgot, they are.