Some time ago, people noticed that buried in the Windows Bluetooth drivers is the hard-coded name of the Microsoft Wireless Notebook Presenter Mouse 8000. What’s going on there? Does the Microsoft Wireless Notebook Presenter Mouse 8000 receive favorable treatment from the Microsoft Bluetooth drivers? Is this some sort of collusion?
No, it’s not that.
There is a lot of a bad hardware out there, and there are a lot of compatibility hacks to deal with it. You have CD-ROM controller cards that report the same drive four times or USB devices that draw more than 500mW of power after promising they wouldn’t. More generally, you have devices whose descriptors are syntactically invalid or contain values that are outside of legal range or which are simply nonsensical.
Most of the time, the code to compensate for these types of errors doesn’t betray its presence in the form of hard-coded strings. Instead, you have “else” branches that secretly repair or ignore corrupted values.
Unfortunately, the type of mistake that the Microsoft Wireless Notebook Presenter Mouse 8000 made is one that is easily exposed via strings, because they messed up their string!
The device local name string is specified to be encoded in UTF-8. However, the Microsoft Wireless Notebook Presenter Mouse 8000 reports its name as Microsoft⟪AE⟫ Wireless Notebook Presenter Mouse 8000, encoding the registered trademark symbol ® not as UTF-8 as required by the specification but in code page 1252. What’s even worse is that a bare ⟪AE⟫ is not a legal UTF-8 sequence, so the string wouldn’t even show up as corrupted; it would get rejected as invalid.
Thanks, Legal Department, for sticking a ® in the descriptor and messing up the whole thing.
There is a special table inside the Bluetooth drivers of “Devices that report their names wrong (and the correct name to use)”. If the Bluetooth stack sees one of these devices, and it presents the wrong name, then the correct name is substituted.
That table currently has only one entry.
Edit: OK apparent the nesting of comments got broken again. This is in reply to Dave Gzorple.
The question of hiding the offender excites me! One cryptographically satisfactory way to preserve privacy of the offender is to hash the offending device ID using two hash functions. In the table, store the first hash and the corrected name encrypted under the second hash. When talking to devices, use the first hash to identify, and once identified, you have the second hash from the device’s reported ID, which enables decryption of corrected name. (Disclaimer that this idea is not new.)
Under appropriate modeling, it...
Yeah, there's something odd going on, I had to reply to my own comment rather than Raymond's one.
Anyway, this isn't what you want because if you hide it too well you end up creating a repeat of the AARD code fiasco. You want to detect the problem without identifying the guilty party but then fix it without appearing to be hiding something, thus my suggestion of hashing the search string to hide the guilty party but leaving the replacement string present but not readily findable via grep. It's a case of obscuring what's going on but not being...
Not only is there no practical engineering reason to do this, it creates runtime dependencies on cryptography, and the driver team now has to go through cryptography security review and plan for random interruptions by time-critical bugs when the cryptographic security landscape changes (such as a hash being declared insecure). Simply not worth the hassle unless there is a specific reason why it has to be done.
I mean, it’s *kind* of like favorable treatment. There must be other less well known devices that also report their name in a corrupted way. Seems like the more generic approach of rejecting any invalid UTF-8 character and continuing to parse the string (recall that UTF-8 is self-synchronizing) would be a reasonable approach that would also handle mistakes by other incompetent or careless companies, too.
I’m not really complaining that MS went the extra mile here, just that it seems like the kind of courtesy you extend to your own hardware division, not the entire Bluetooth hardware ecosystem.
So, I used to work with the Windows Bluetooth team and helped with the Bluetooth Accessory Guidelines. Part of the goal of the guidelines is to point out where device makers commonly make mistakes. For all the different faults we have seen in different devices, that particular one hadn't been reported.
I can quickly think of two potential reasons why this isn't a bigger problem:
1. More devices do their Bluetooth configuration via a program that doesn't allow malformed UTF-8. Or similarly, computer IDEs are more likely to automatically save code files as UTF-8
2. More people test their devices with...
What about the previous 7,999 iterations of the Microsoft® Wireless Notebook Presenter Mouse? Did they do OK? 😉
“Microsoft Wireless Notebook Presenter Mouse 8000”; it just rolls off the tongue. I wonder how many rounds of meetings and focus groups were involved to come up with that name…
How did Microsoft ship a device without testing it?
It’s easy to come up with how this could happen. Maybe Windows version N does not validate the string. Device ships with invalid string. Device works on Windows version N because there is no enforcement. Windows version N+1 adds string validation, say, to deal with a security issue. Now device is broken. Windows version N+1 needs to add compatibility workaround so that device continues to work on version N+1.
This is hilarious. Fun read.
The line “That table currently has only one entry.” sent me. So I looked up the device and apparently it came out in 2006. This is then very surprising because I thought such errors were mostly created in the 90s.
Microsoft has always made pretty decent hardware, it’s only when they start straying into software that the wheels come off.
In terms of not embarrassing vendors with buggy devices, a way of masking their identities would be to store a hash of the ID string, not the string itself. It’s possible (albeit rather unlikely) that eventually someone could reverse-engineer the driver and write a blog post about secret handling of some devices, but at least the names aren’t sitting there in plain view any more.
Sure, in which case obscure it with some noddy masking technique that breaks up the text, you just need enough to hide it from ‘strings’ and ‘grep’ and similar – you’re not really hiding it, just making sure there isn’t conspiracy-theory-feeding plain text visible.
This is an ongoing problem with other products as well, where you need to work around bugs from other vendors but don’t necessarily want to embarrass them by revealing whose bugs you’re working around. Usually it’s just a case of identifying a particular vendor so you only have to mask the input, not the output.
That would hide the malformed string, but the replacement string would still be visible. (Unless you want to encrypt that too.)