Another way to sort GUIDs: Java

Raymond Chen

Raymond

Some time ago, I surveyed a number of GUID-sorting algorithms. At the time, I noted, “Thankfully, it never occurred to anyone to try to sort GUID components as signed integers!”

How wrong I was.

For the purpose of sorting, Java treats each GUID as a pair of signed 64-bit integers in big-endian format.

This means that the following list of GUIDs is sorted according to Java:

{80000000-0000-0000-8000-000000000000}
{80FFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{FFFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{00FFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7F00FFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFF00FF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFF00-FFFF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-00FF-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FF00-FFFF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-00FF-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FF00-7FFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-80FF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-00FF-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7F00-FFFFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-00FFFFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FF00FFFFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFF00FFFFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFF00FFFF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFF00FF}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFF00}
{7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}

The most significant bit of each 64-bit portion is a sign bit. This means that the smallest possible GUID is

  {80000000-0000-0000-8000-000000000000}

and the largest possible GUID is

  {7FFFFFFF-FFFF-FFFF-7FFF-FFFFFFFFFFFF}

In the highlighted red columns (corresponding to bits 0 and 64), the sort order is 89ABCDEF01234567. In the other columns, the sort order is 0123456789ABCDEF.

Raymond Chen
Raymond Chen

Follow Raymond   

11 comments

Comments are closed.

    • Avatar
      Danstur

      I fail to see the problem. I assume this is just a “it’s hip to bash java” reaction?

      As long as the algorithm guarantees a total order it really doesn’t matter how exactly it works. Assuming Java uses two longs or a byte array to represent a guid internally picking this sort order should be pretty efficient to implement.

  • Avatar
    Neil Rashbrook

    Java doesn’t believe in unsigned numbers, so this is something we should have expected, in retrospect.

    • Avatar
      Ken Keenan

      The most egregious outcome of this design choice, IMO, is that Java considers bytes as signed. If they’d really wanted an 1-byte signed integral type, couldn’t they have created one called tiny or something and reserved bytes as a special case for when dealing with binary blobs? It really makes my eyelid twitch.

  • Gunnar Dalsnes
    Gunnar Dalsnes

    Moral of the story is the UUID spec should have defined sort order. If sort order is undefined behaviour, Java sort order is as good as anything IMO.
    Also UUID specers was short sighted in creating a non sequential order. If anyone deserves a kick in the nuts I’d say the UUID specers for sure:-)

    • Raymond Chen
      Raymond Chen

      Why should they define a sort order? Sorting UUIDs is a meaningless operation. Should the HTML DOM specify a sort order for colors? Should the IEEE define a sort order for MAC addresses?

      • Avatar
        Sacha Roscoe

        If sorting UUIDs is a meaningless operation (yet needed for certain use cases, such as those you mention in the earlier article), then what’s so terrible about the way Java does it? It defines a total order: job done. Unless you’re trying to debug your own ordered map implementation, what need do you have for UUIDs to be ordered in an unsurprising way?
        Not that I’m in favour of surprising sort orders. But I’m struggling to see why the average developer should care; it seems like one of those things that should be treated as opaque, because if you ever need to know the details, you’re probably doing something very wrong.

        • Avatar
          cheong00

          For some reason I would rather seeing people sort UUID as string, even when I know it’s much more expensive operation.

      • Gunnar Dalsnes
        Gunnar Dalsnes

        As a data type, universal sort order seems natural but I guess the specers did not intend UUID to become a data type (just a bunch of bytes).

  • Avatar
    Pierre Baillargeon

    Java chose a computationally efficient comparison operator. Mst of the time, that’s what you want.

    If you need to present alphabetically-sorted GUID to the user, roll your own comparison.

    As for signedness, I prefer Java’s choices of trying hard to use signed numbers everywhere than the mess and bug-prone choice of C++. Making many API be unsigned while having weird promotion rules causes to many subtle problems.