primary keys
Posted by Ian Holsman
Brad Fitzpatrick has a nice discussion over what people should be using for their ‘id’ keys in a distributed environment. and puts a couple of points forward about why it isn’t such a good idea for his application/architecture. which has a ‘few’ central machines handing out ID’s.
The is a central problem with sequences that i’ve seen in most DB platforms. That and their 32-bit size. (I think mysql is 32-bit, i’m sure the geeks who read this will correct me)
The actual question has (If I am reading this correctly) is not about UUID’s vs local-sequences, but how do I move a ‘tree’ of records from one cluster to another, and not have a ID-clash. as if each cluster was independant, it wouldnt be a concern. you would have a central USER->cluster mapping algorithm/server which might be UUID or 64-bit based and then all future requests could be directed to the correct cluster, where locally-generated sequences would be fine.
Another approach would be to increase the size of the ID to 64-bits, with the top n-bits (say 8) designating the cluster which created the sequence. when a new cluster/allocater comes online it would register itself and it’s cluster-ID# would be incremented. then all sequences generated by this cluster allocater would just start at X. this would give you 56bits of uniqueness per cluster. but it would still require some kind of central co-ordination at cluster creation time.
This would increase the key size from 4 bytes to 8.
But personally, I would see how I can avoid this whole issue complelty, and try to make each cluster fully independant and NOT have to merge/move users between clusters… with the price of today’s hardware about $5k/box.. the issue of excess capacity and non utilization is less important I think.