When you start talking about numbers as small as 2⁻¹²², you have to start looking more closely at the things you thought were zero

Raymond Chen

Raymond

GUID generation algorithm 4 fills the GUID with 122 random bits. The odds of two GUIDs colliding are therefore one in 2¹²², which is a phenomenally small number. When you are dealing with rates this low, you have to adjust your frame of reference.

Some people don’t like algorithm 4 because the algorithm is not designed to ensure uniqueness. There is a chance, admittedly small, that a collision will occur. On the other hand, algorithm 1, for example, uses time-space coordinates to ensure zero chance of duplication, unless someone screws up the generation process.

Hang on a second. “Unless someone screws up the generation process.” Ah, so it’s not zero after all, because there is of course a tiny chance that something will get screwed up in the generation process.

According to this research paper (pdf), memory chips suffer 25,000 to 70,000 errors per billion hours per megabit. Let’s do some back-of-the-envelope calculations.

25,000 to 70,000 errors


10⁹ hours × 2²⁰ bits
 ≈  2¹⁵ errors


2³⁰ hours × 2²⁰ bits
 =  1 error


2³⁵ hours × bits

Now let’s see how this error rate converts to “corrupted GUIDs”.

1 error


2³⁵ hours × bits
 ×  2⁷ bits


1 GUID
 ×  1 hour


60 × 60 × 10⁹ nanoseconds

Now, 60 × 60 = 3600 is approximately 4096 = 2¹², and 10⁹ is approximately 2³⁰.

≈  2⁷ errors


2³⁵ × 2¹² × 2³⁰ GUID × nanosecond
 =  1 errors


2⁷⁰ GUID × nanosecond

The odds that a GUID will be corrupted in memory in the one nanosecond between the time it is created and the time the value is copied somewhere safe is therefore around one in 2⁷⁰.

This is 2⁵² ≈ 10¹⁵ = a quadrillion* times more likely than a collision between two GUIDs generated by algorithm 4.

And that’s just mechanical failures in the RAM chips. There are other ways that the generation process can get screwed up. For example, there could be a failure in the CPU itself, maybe due to overheating or overclocking. Or a bit error on the hard drive. Or a programming error.

When you start talking about numbers as small as 2⁻¹²², you have to start looking more closely at all the things that you thought were zero.

* Or “a thousand billion” if you use the European scale.

Bonus chatter: Note that I’m not saying that a collision caused by a random bit flip is more likely than a collision caused by a v4 GUID happening to match some other v4 GUID. I’m saying that it’s wrong to say that the v1 GUID generator will never generate a collision, because there’s a nonzero probability that the v1 GUID generator will deviate from the algorithm. Sure, that probably may be minuscule, but when you’re dealing with numbers like 2⁻¹²², these minuscule probabilities become significant.

Raymond Chen
Raymond Chen

Follow Raymond   

0 comments

Comments are closed.