Some time ago, I surveyed a number of GUID-sorting algorithms. At the time, I noted, “Thankfully, it never occurred to anyone to try to sort GUID components as signed integers!”
How wrong I was.
For the purpose of sorting, Java treats each GUID as a pair of signed 64-bit integers in big-endian format.
This means that the following list of GUIDs is sorted according to Java:
{80000000-0000-0000-8000-000000000000} |
{80FFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF} |
{FFFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF} |
{00FFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF} |
{7F00FFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF} |
{7FFF00FF-FFFF-FFFF-7FFF-FFFFFFFFFFFF} |
{7FFFFF00-FFFF-FFFF-7FFF-FFFFFFFFFFFF} |
{7FFFFFFF-00FF-FFFF-7FFF-FFFFFFFFFFFF} |
{7FFFFFFF-FF00-FFFF-7FFF-FFFFFFFFFFFF} |
{7FFFFFFF-FFFF-00FF-7FFF-FFFFFFFFFFFF} |
{7FFFFFFF-FFFF-FF00-7FFF-FFFFFFFFFFFF} |
{7FFFFFFF-FFFF-FFFF-80FF-FFFFFFFFFFFF} |
{7FFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF} |
{7FFFFFFF-FFFF-FFFF-00FF-FFFFFFFFFFFF} |
{7FFFFFFF-FFFF-FFFF-7F00-FFFFFFFFFFFF} |
{7FFFFFFF-FFFF-FFFF-7FFF-00FFFFFFFFFF} |
{7FFFFFFF-FFFF-FFFF-7FFF-FF00FFFFFFFF} |
{7FFFFFFF-FFFF-FFFF-7FFF-FFFF00FFFFFF} |
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFF00FFFF} |
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFF00FF} |
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFF00} |
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF} |
The most significant bit of each 64-bit portion is a sign bit. This means that the smallest possible GUID is
{80000000-0000-0000-8000-000000000000}
and the largest possible GUID is
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
In the highlighted columns (corresponding to bits 0 and 64), the sort order is 89ABCDEF01234567. In the other columns, the sort order is 0123456789ABCDEF.
Java chose a computationally efficient comparison operator. Mst of the time, that’s what you want.
If you need to present alphabetically-sorted GUID to the user, roll your own comparison.
As for signedness, I prefer Java’s choices of trying hard to use signed numbers everywhere than the mess and bug-prone choice of C++. Making many API be unsigned while having weird promotion rules causes to many subtle problems.
Moral of the story is the UUID spec should have defined sort order. If sort order is undefined behaviour, Java sort order is as good as anything IMO.
Also UUID specers was short sighted in creating a non sequential order. If anyone deserves a kick in the nuts I’d say the UUID specers for sure:-)
Why should they define a sort order? Sorting UUIDs is a meaningless operation. Should the HTML DOM specify a sort order for colors? Should the IEEE define a sort order for MAC addresses?
As a data type, universal sort order seems natural but I guess the specers did not intend UUID to become a data type (just a bunch of bytes).
If sorting UUIDs is a meaningless operation (yet needed for certain use cases, such as those you mention in the earlier article), then what's so terrible about the way Java does it? It defines a total order: job done. Unless you're trying to debug your own ordered map implementation, what need do you have for UUIDs to be ordered in an unsurprising way?
Not that I'm in favour of surprising sort orders. But I'm struggling...
For some reason I would rather seeing people sort UUID as string, even when I know it’s much more expensive operation.
Java doesn’t believe in unsigned numbers, so this is something we should have expected, in retrospect.
The most egregious outcome of this design choice, IMO, is that Java considers bytes as signed. If they’d really wanted an 1-byte signed integral type, couldn’t they have created one called tiny or something and reserved bytes as a special case for when dealing with binary blobs? It really makes my eyelid twitch.
Oh, the humanity!
I didn’t actually believe this and had to look it up.
http://hg.openjdk.java.net/jdk/jdk/file/1def54255e93/src/java.base/share/classes/java/util/UUID.java#l445
Raymond, I’m sorry I doubted you.
Java, I’m sorry I ever trusted you.
I fail to see the problem. I assume this is just a “it’s hip to bash java” reaction?
As long as the algorithm guarantees a total order it really doesn’t matter how exactly it works. Assuming Java uses two longs or a byte array to represent a guid internally picking this sort order should be pretty efficient to implement.