September 13th, 2019

Another way to sort GUIDs: Java

Some time ago, I surveyed a number of GUID-sorting algorithms. At the time, I noted, “Thankfully, it never occurred to anyone to try to sort GUID components as signed integers!”

How wrong I was.

For the purpose of sorting, Java treats each GUID as a pair of signed 64-bit integers in big-endian format.

This means that the following list of GUIDs is sorted according to Java:

{80000000-0000-0000-8000-000000000000}
{80FFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{FFFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{00FFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7F00FFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFF00FF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFF00-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-00FF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FF00-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-00FF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FF00-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-80FF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-00FF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7F00-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-00FFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FF00FFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFF00FFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFF00FFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFF00FF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFF00}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}

The most significant bit of each 64-bit portion is a sign bit. This means that the smallest possible GUID is

  {80000000-0000-0000-8000-000000000000}

and the largest possible GUID is

  {7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}

In the highlighted columns (corresponding to bits 0 and 64), the sort order is 89ABCDEF01234567. In the other columns, the sort order is 0123456789ABCDEF.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

11 comments

Discussion is closed. Login to edit/delete existing comments.

  • Pierre Baillargeon

    Java chose a computationally efficient comparison operator. Mst of the time, that’s what you want.

    If you need to present alphabetically-sorted GUID to the user, roll your own comparison.

    As for signedness, I prefer Java’s choices of trying hard to use signed numbers everywhere than the mess and bug-prone choice of C++. Making many API be unsigned while having weird promotion rules causes to many subtle problems.

  • Gunnar Dalsnes

    Moral of the story is the UUID spec should have defined sort order. If sort order is undefined behaviour, Java sort order is as good as anything IMO.
    Also UUID specers was short sighted in creating a non sequential order. If anyone deserves a kick in the nuts I’d say the UUID specers for sure:-)

    • Raymond ChenMicrosoft employee Author

      Why should they define a sort order? Sorting UUIDs is a meaningless operation. Should the HTML DOM specify a sort order for colors? Should the IEEE define a sort order for MAC addresses?

      • Gunnar Dalsnes

        As a data type, universal sort order seems natural but I guess the specers did not intend UUID to become a data type (just a bunch of bytes).

      • Sacha Roscoe

        If sorting UUIDs is a meaningless operation (yet needed for certain use cases, such as those you mention in the earlier article), then what's so terrible about the way Java does it? It defines a total order: job done. Unless you're trying to debug your own ordered map implementation, what need do you have for UUIDs to be ordered in an unsurprising way?
        Not that I'm in favour of surprising sort orders. But I'm struggling...

        Read more
      • cheong00

        For some reason I would rather seeing people sort UUID as string, even when I know it’s much more expensive operation.

  • Neil Rashbrook

    Java doesn’t believe in unsigned numbers, so this is something we should have expected, in retrospect.

    • Ken Keenan

      The most egregious outcome of this design choice, IMO, is that Java considers bytes as signed. If they’d really wanted an 1-byte signed integral type, couldn’t they have created one called tiny or something and reserved bytes as a special case for when dealing with binary blobs? It really makes my eyelid twitch.

  • Alexey Badalov

    Oh, the humanity!

    • Daniel Sturm

      I fail to see the problem. I assume this is just a “it’s hip to bash java” reaction?

      As long as the algorithm guarantees a total order it really doesn’t matter how exactly it works. Assuming Java uses two longs or a byte array to represent a guid internally picking this sort order should be pretty efficient to implement.