How many ways are there to sort GUIDs? How much time do you have?

Raymond Chen

Suppose you want to sort GUIDs, say because they are a key in an ordered map. How many ways are there to order them?

Before we can even talk about how to order GUIDs, we need to figure out how we’re going to represent them. You can take the view that a GUID is just an array of 16 bytes.¹

00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF

But the GUID structure itself groups them into four fields, one of which is an array.

typedef struct _GUID {
  DWORD Data1;
  WORD  Data2;
  WORD  Data3;
  BYTE  Data4[8];
} GUID;

This groups the bytes as follows:

33221100 5544 7766 88 99 AA BB CC DD EE FF
Data1 Data2 Data3 Data4

The bytes in Data1, Data2, and Data3 are flipped because Windows is little-endian.

And of course in Windows, it is common to represent GUIDs in their stringified form.

{ 33221100 - 5544 - 7766 - 88 99 - AA BB CC DD EE FF }
  Data1 - Data2 - Data3 - Data4

Notice that the first three integer-sized groups are flipped, but the fourth one isn’t. I’m always scared of that fourth group. (I’m not tempted to flip the last group because it’s six bytes long, which is not the natural size of any integer type on Windows.)

Since the difference between the structured form and the string form is only in the placement of punctuation marks, and not in the byte ordering, I’ll limit myself to byte-array representation and string representation.

Okay, now that we know how to represent GUIDs, we can start sorting them.

If you treat the GUID as an array of 16 bytes, then you can sort them with memcmp, which is a lexicographical sorting by bytes. The comparisons are made as unsigned values. (Thankfully, it never occurred to anyone to try to sort GUID components as signed integers!)³ This means that the following list of GUIDs is sorted according to memcmp:

Byte array String
00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFFFF00-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFF00FF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF {FF00FFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF {00FFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-FF00-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-00FF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FF00-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-00FF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-00FF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FF00-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-00FFFFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FF00FFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFF00FFFFFF}
FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFF00FFFF}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFF00FF}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFF00}

The .NET Framework System.Guid.CompareTo method compares the structure members lexicographically, and the bytes of the array are also sorted lexicographically. The sorted array for System.Guid looks like this:

Byte array String
FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF {00FFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF {FF00FFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFF00FF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFFFF00-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-00FF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-FF00-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-00FF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FF00-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-00FF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FF00-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-00FFFFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FF00FFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFF00FFFFFF}
FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFF00FFFF}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFF00FF}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFF00}

Next is string sorting. You will use a case-insensitive sort if you want to preserve your sanity. A memberwise lexicographical sort of the structure is equivalent to sorting the strings because, as we noted earlier, the stringification is the same as the structure version, just with additional punctuation. So the above list is also sorted according to case-insensitive stringification. Hooray, two sorting algorithms agree on something!

Next up is System.Data.SqlTypes.SqlGuid. Yes, SQL has its own GUID, because it’s SQL. Not only does it have its own GUID, it has its own GUID sorting algorithm. Because it’s SQL. And it doesn’t call it a GUID, but rather calls it uniqueidentifier. Again, because it’s SQL.

SQL sorts GUIDs by breaking the stringified version into groups, sorting groups right to left, and sorting bytewise within each group. I want to know what they were thinking when they came up with this.² You end up with this sorted array for System.Data.SqlTypes.SqlGuid:

Byte array String
FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-00FFFFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FF00FFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFF00FFFFFF}
FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFF00FFFF}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFF00FF}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFF00}
FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-00FF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FF00-FFFFFFFFFFFF}
FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FF00-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-00FF-FFFF-FFFFFFFFFFFF}
FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-FF00-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-00FF-FFFF-FFFF-FFFFFFFFFFFF}
00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFFFF00-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFF00FF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF {FF00FFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF {00FFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}

The last GUID sorting algorithm I could find is used by the Platform::Guid value class. Its operator< treats the GUID as if it were four 32-bit unsigned integers, and sorts them lexicographically. This sort order was designed for performance.

Byte array String
FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF {00FFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF {FF00FFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFF00FF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF {FFFFFF00-FFFF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-00FF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FF00-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-00FF-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF {FFFFFFFF-FF00-FFFF-FFFF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FF00FFFFFFFF}
FF FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-00FFFFFFFFFF}
FF FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-FF00-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF 00 FF FF FF FF FF FF FF {FFFFFFFF-FFFF-FFFF-00FF-FFFFFFFFFFFF}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFF00}
FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFF00FF}
FF FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFF00FFFF}
FF FF FF FF FF FF FF FF FF FF FF FF 00 FF FF FF {FFFFFFFF-FFFF-FFFF-FFFF-FFFF00FFFFFF}

Okay, let’s try to summarize all these results. I’m going to number the bytes of the GUID in the order they are compared, where 00 is the byte compared first (most significant), and FF is the byte compared last (least significant).

Algorithm Byte array String
memcmp 00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF {33221100-5544-7766-8899-AABBCCDDEEFF}
System.Guid 33 22 11 00 55 44 77 66 88 99 AA BB CC DD EE FF {00112233-4455-6677-8899-AABBCCDDEEFF}
string
SqlGuid CC DD EE FF AA BB 88 99 66 77 00 11 22 33 44 55 {FFEEDDCC-BBAA-9988-6677-001122334455}
Platform::Guid 33 22 11 00 77 66 55 44 BB AA 99 88 FF EE DD CC {00112233-6677-4455-BBAA-9988FFEEDDCC}

If you find another GUID sorting algorithm in common use, let me know. Or maybe I’m better off not knowing.

Bonus chatter: The result of sorting GUIDs is not generally meaningful, but some algorithms and data structures require keys to be sortable. For example, binary search and std::map require that the key space be totally-ordered.

¹ Although that isn’t quite right because GUIDs must be 4-byte aligned, and bytes don’t come with that restriction.

² Maybe they wanted to group together GUIDs from the same system? In type 1 GUIDs, the final six bytes identify the machine that generated the GUID.

³ Update: Turns out I was wrong. There exist people who really are that crazy.

13 comments

Discussion is closed. Login to edit/delete existing comments.

  • Henrik Arlinghaus 0

    I wonder if Platform::Guid is implemented branchless using SSE. So far I always used memcmp, but may try to code a faster SSE implementation.

  • Peter Cooper Jr. 0

    For SQL GUIDs and Footnote 2, I think that sorting it that way means that new sequential Type-1 GUIDs generated by the same server usually (or always?) ends up at the end of the sort? If so, that would mean that if uniqueIdentifier is used as a key, and insertions usually happen on one database server, you usually end up with new inserted records being added to the end of the data structure. I can imagine that would lead to enough improved performance to be worth doing that way. Just a wild guess, though.

    • Dave Bacher 0

      I can tell you what they were thinking, but I can’t tell you if the type-1 is coinidental or intentional.
      Pretend that I buy four servers from the same batch, an stage them.  If I do a left to right sort – my index is going to start with the MAC address (which is almost identical) followed by the most constant portion of the clock.  Followed by the most constant part of my counter, followed by the least constant portion of my counter.
      And so once I get to the MAC address bucket, I’ve only discarded 3/4 of the data.
      If I start from the right instead, each digit eliminates 15/16 of the data – and I’ve likely completed by the time I get to the clock value.  I’m sorting the most dynamic part of the time first, so again – I’m eliminating 15/16 at a time there, until I get to the constant part.  I’m then sorting the most dynamic part of the MAC adress first – and so I most likely never make it to the vendor prefix.
      So its about the indexing speed – if I’m pulling sorted records, I want to use the index for an in-order traversal – and so my sort order needs to match the index order, and the index order is selected for speed.

      • David Walker 0

        MAC addresses are not used any more in generating GUIDs.  That is, type 4 GUIDS (which are returned by SQL and .NET funcitons) don’t use the computer’s MAC address at all.

    • Kalle Niemitalo 0

      Newsequentialid (Histrory/Benefits and Implementation) says SQL Server already had its UNIQUEIDENTIFER comparison scheme before they added NEWSEQUENTIALID() in SQL Server 2005 to improve INSERT performance.
      However, MSDN Library for Visual Studio 2008 says that on Windows NT 4.0, UuidCreate used the MAC address (so it presumably generated Type-1 GUIDs) and UuidCreateSequential did not exist. I suspect that the SQL Server function NEWID() may have generated Type-1 GUIDs on NT4 and earlier. In which case it is strange that SQL Server’s UNIQUEIDENTIFER comparison scheme did not already sort time stamps optimally and Microsoft had to implement “some byte scrambling” in NEWSEQUENTIALID().

    • David Walker 0

      Microsoft SQL has a Newid() function, plus a Newsequentialid() function that definitely will end up sorting “higher” than any other sequential ID that has been generated by the same server since its last reboot.  If you use SQL’s standard sorting, that is.  So yes, you are basically correct, except that newid() is easier to use than newsequentialid() — the latter can only be used in a Default expression for a column type.  It can’t be used anywhere you want (which I thin kis unfortunate).   Guids returned by Newid() will sort randomly.
      (Edited to make a correction at “since its last reboot”.)

      • Michael Ryan 0

        You can use a COMB GUID library to generate sequencial Ids in your application…  The sequence may be slightly out of order at the database level, which is the main reason, but much better performing in an index than UUIDv4 spec alone.

  • Dave Steckler 0

    As the dev who wrote the first implemention of UuidCreate for what eventually became Win32, I can add we were indeed incredibly concerned with using the mac address portion of the uuid (guid) for, well, everything. Grouping/sorting on individual network servers was a significant consideration as we pondered whether to adhere the UUID standard of the time, or break it and invent our own. For example, we had all sorts of problems preserving this consistency on machines with other-than-1 network cards. I recall extravagant XOR schemes tossed about conference rooms, and eventually discarded.Side note: Frankly, I hope my original slm(?) checkin is 100% lost to time. It was the first job I was given after switching from SDE-T to SDE. I can only imagine what the code looks like. Egad. Can you imagine giving such a critical task to a brand new, young dev straight out of test? Lol.

    • Wil Wilder Apaza Bustamante 0

      This comment has been deleted.

  • Matthew van Eerde (^_^)Microsoft employee 0

    If you make a bunch of files with GUIDs as their names, File Explorer will sort them using StrCmpLogicalW. This is designed to make “track 9.mp3” < “track 10.mp3” which makes perfect sense, but unfortunately when you apply it to GUIDs you get weird behavior like “{AA4982AA-…” < “{AA26724A-“.

  • yuri.v.khan@gmail.com 0

    From the title alone (with the “how much time”), I was expecting the answer to be (2^128)!. That’s the total number of ways the set of all GUIDs can be ordered, and if we agree to enumerate them with a constant speed of several billions per second, still takes lots of time.

  • Nico Sap 0

    I thought a Guid had ( in some cases) a datetime variable to be unique. I was actually hoping in how to use this to my advantage. I noticed sometimes the data is already ordered and I was thinking it could have been by the Guid ( just now).  I found a site that can extract it ( https://www.famkruithof.net/uuid/uuidgen?typeReq=-1 ), they do mention that v4 makes it impossible though. Any input on this?

    Ps. Some algorithms are also described in the UUID standard: https://www.ietf.org/rfc/rfc4122.txt

  • Michael Ryan 0

    The sorting for SQL Server coincides with the UUID v1 variant.  The default sorting is on the timestamp portion for performance reasons.  The “NEWSEQUENTIALID” is part UUIDv1 (sequential) and part UUIDv4 (random) … The reason is that index performance out of the box works better this way.  Prior to the addition, there was an article and library for a COMB guid generator that was effectively this format.  It has been repeated.The SQL Server variant and sorting probably makes the most sense overall.  As to serialization, there’s a few variants.  MongoDB’s JS library supports like 3-4 serialization options mostly because of the incompatible serializations that are in differing languages/platforms.

    https://codingsight.com/combguid-generation-of-sql-server-friendly-guid-values-in-net-applications/

Feedback usabilla icon