September 28th, 2022

Why does COM express GUIDs in a mix of big-endian and little-endian? Why can’t it just pick a side and stick with it?

Wikipedia claims that the COM textual representation of GUIDs is mixed-endian.

Is it, really?

No, it is little-endian all the way. But if you don’t understand how GUIDs are formed, it might look like some parts are big-endian.

The parts of a GUID as defined in the specification are

Field Type
time_low 32-bit integer
time_mid 16-bit integer
time_hi_and_version 16-bit integer
clock_seq_hi_and_reserved 8-bit integer
cloc_seq_low 8-bit integer
node 6-byte MAC address

The GUID structure breaks it down as

struct GUID
{
    uint32_t Data1;
    uint16_t Data2;
    uint16_t Data3;
    uint8_t  Data4[8];
}

Let’s line up the two pieces against each other.

Field Type Structure
time_low 32-bit integer Data1
time_mid 16-bit integer Data2
time_hi_and_version 16-bit integer Data3
clock_seq_hi_and_reserved 8-bit integer Data4[0]
cloc_seq_low 8-bit integer Data4[1]
node 6-byte MAC address Data4[2..7]

If you print out each piece of the GUID structure, with hyphens between each part, then you get

33221100-5544-7766-88-99-AA-BB-CC-DD-EE-FF

Notice that everything is still little-endian. We didn’t have to do any byte flipping when printing:

printf("%04x-%04x-%04x-%02x-%02x-%02x-%02x-%02x-%02x-%02x-%02x",
    Data1, Data2, Data3,
    Data4[0], Data4[1], Data4[2], Data4[3],
    Data4[4], Data4[5], Data4[6], Data4[7]);

My guess is that the folks who designed the string format thought there were too many dashes, so they removed the byte dashes, except for the one that separates the clock bytes from the MAC address.

printf("%04x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x",
    Data1, Data2, Data3,
    Data4[0], Data4[1], Data4[2], Data4[3],
    Data4[4], Data4[5], Data4[6], Data4[7]);

33221100-5544-7766-8899-AABBCCDDEEFF

The result is that the last two pieces of the stringified GUID look big-endian, but they’re not. They’re just little-endian with some dashes missing.

Topics
Other

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

2 comments

Discussion is closed. Login to edit/delete existing comments.

  • Khalfan Aziz

    My question regarding a different aspect of COM as shown in the project BmpExt in this old article here:
    https://docs.microsoft.com/en-us/archive/msdn-magazine/2000/march/windows-2000-ui-innovations-enhance-your-user-s-experience-with-new-infotip-and-icon-overlay-shell-extensions
    The shell extension is informed of the full-path filename of the file where the mouse hovers upon, but only for disk-based file systems and not for files that exist in portable devices such as a camera or a smartphone.
    What must I do to have a comprehensible shell extension that also informs the extension of those files in portable device systems ?

  • noexcept

    The members of the clock value are represented in big endian in the struct with first clock_seq_hi, then cloc[k?]_seq_low. So these two bytes come out big endian style.

    The first field in the printf format should probably be “%08x”?