large distributed systems

Posted by Ian Holsman Tue, 24 Jun 2008 19:30:00 GMT

So I have recently been paying a lot of attention to systems with huge amounts of data in them.

be it Relegence that deals with lots of incoming news stories and figuring out what they are about in real time or the data layer that is dealing with click streams and recommendation engines.

One of the interesting questions is how we make this data available to the publishing systems, as the data sizes mean we can’t find the entire table onto a single machine.

So in my research I have seen 4-5 ways to do horizontal partitioning (or is it vertical.. i always get confused with the names).

  • consistent hashing. easy to understand. easy to implement, until you need to add a group of machines.
  • A central database holding the location of the records. An oldie but a goodie. you can easily add machines, and reconfigure the distribution to ‘move’ records across machines to compensate. But It has a central directory which means you need to worry about scalaing the directory (a smaller problem to be sure)
  • A Distributed Hash Table approach like the one discussed at onscale
  • Crush a pseudo-random distribution thing similar to consistent hashing, that can handle adding new machines and load issues.
  • Amazon’s Dynamo which has a “a gossip based distributed failure detection and membership protocol.” looking deeper it some mixture of consistent hashing and a dht/chord.
  • Hypertable/HBase. which leave the partitioning (and replication) to the distributed file store they sit on (hadoop). which isn’t a bad idea as alot of work has gone into that. but I still have my doubts on how it will handle a OLTP load (and I should just bit the bullet and run a performance test to remove my doubts)

you also need to deal with replication/load balancing issues. Specifically you need to handle the case of failure (and failure of racks / data centers).

from what I can see you have 2 different choices to make.

  • Full Consistency vs Eventual Consistency
  • What level do to replicate at

most of the people (including engineers to be honest) can’t seem to get their head around this eventual consistency thing. They are used to a update changing the value there and then.

replication on the other hand is usually handled by letting mysql do it. personally i prefer a finer grain approach where you can have some records with more replicas than others to cater for some things being more equal than others. (brittney spears gets more hits than johnny cash for example), and dynamic loads where the system automagically adds more replicas based on activity

whatever replication we choose it has to be battled hardened.. you don’t want to call up the CEO and explain why 10% of his data has just disappeared into the ether, or why you will be down for 6 hours recovering from a tape.

the other choice is what the data store should be.

  • a relational DB like mysql.
  • a Key/Value pair, with limited semantics on how to retrieve/store the information

personally I like the key/value pair as it makes life simpler.. but I know a lot of people who like mysql

so.. this is main thing on my mind at the moment. your thoughts are welcome naturally

Posted in  | Tags , , , ,  | 4 comments