GUID generation algorithm 4 fills the GUID with 122 random bits. The odds of two GUIDs colliding are therefore one in 2¹²², which is a phenomenally small number. When you are dealing with rates this low, you have to adjust your frame of reference.
Some people don’t like algorithm 4 because the algorithm is not designed to ensure uniqueness. There is a chance, admittedly small, that a collision will occur. On the other hand, algorithm 1, for example, uses time-space coordinates to ensure zero chance of duplication, unless someone screws up the generation process.
Hang on a second. “Unless someone screws up the generation process.” Ah, so it’s not zero after all, because there is of course a tiny chance that something will get screwed up in the generation process.
According to this research paper (pdf), memory chips suffer 25,000 to 70,000 errors per billion hours per megabit. Let’s do some back-of-the-envelope calculations.
25,000 to 70,000 errors
10⁹ hours × 2²⁰ bits |
≈ | 2¹⁵ errors
2³⁰ hours × 2²⁰ bits |
= | 1 error
2³⁵ hours × bits |
Now let’s see how this error rate converts to “corrupted GUIDs”.
1 error
2³⁵ hours × bits |
× | 2⁷ bits
1 GUID |
× | 1 hour
60 × 60 × 10⁹ nanoseconds |
Now, 60 × 60 = 3600 is approximately 4096 = 2¹², and 10⁹ is approximately 2³⁰.
≈ | 2⁷ errors
2³⁵ × 2¹² × 2³⁰ GUID × nanosecond |
= | 1 errors
2⁷⁰ GUID × nanosecond |
The odds that a GUID will be corrupted in memory in the one nanosecond between the time it is created and the time the value is copied somewhere safe is therefore around one in 2⁷⁰.
This is 2⁵² ≈ 10¹⁵ = a quadrillion* times more likely than a collision between two GUIDs generated by algorithm 4.
And that’s just mechanical failures in the RAM chips. There are other ways that the generation process can get screwed up. For example, there could be a failure in the CPU itself, maybe due to overheating or overclocking. Or a bit error on the hard drive. Or a programming error.
When you start talking about numbers as small as 2⁻¹²², you have to start looking more closely at all the things that you thought were zero.
* Or “a thousand billion” if you use the European scale.
Bonus chatter: Note that I’m not saying that a collision caused by a random bit flip is more likely than a collision caused by a v4 GUID happening to match some other v4 GUID. I’m saying that it’s wrong to say that the v1 GUID generator will never generate a collision, because there’s a nonzero probability that the v1 GUID generator will deviate from the algorithm. Sure, that probably may be minuscule, but when you’re dealing with numbers like 2⁻¹²², these minuscule probabilities become significant.
0 comments