{"id":109915,"date":"2024-06-19T07:00:00","date_gmt":"2024-06-19T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=109915"},"modified":"2024-06-19T14:17:46","modified_gmt":"2024-06-19T21:17:46","slug":"20240619-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20240619-00\/?p=109915","title":{"rendered":"On the sadness of treating counted strings as null-terminated strings"},"content":{"rendered":"<p>There are a number of data types which represent a counted string. Some of them are in the C++ standard library, like <code>std::string<\/code> and <code>std::wstring<\/code>. Some of them are Windows-specific like <code>BSTR<\/code> or <code>HSTRING<\/code>. Be careful when treating these counted strings as null-terminated strings.<\/p>\n<p>Treating a counted string as a null-terminated string is a lossy operation, because any embedded nulls in the counted string are mistakenly interpreted as the end of the string.<\/p>\n<pre>std::string s = \"hello\\0world\"s;\r\n\r\n\/\/ This prints \"hello&lt;nul&gt;world\"\r\nstd::cout &lt;&lt; s &lt;&lt; std::endl;\r\n\r\n\/\/ Copy it through c_str\r\nstd::string t = s.c_str();\r\n\r\n\/\/ This prints \"hello\"\r\nstd::cout &lt;&lt; t &lt;&lt; std::endl;\r\n<\/pre>\n<p>The embedded null in the string <code>s<\/code> is treated as the string terminator when we interpret the <code>c_str()<\/code> as a null-terminated string, and the last part of the string is lost.<\/p>\n<p>Now, you wouldn&#8217;t be so silly as to copy a <code>std::string<\/code> that way, seeing as there is a copy constructor right there.<\/p>\n<pre>std::string t = s; \/\/ use the copy constructor\r\n<\/pre>\n<p>But when you&#8217;re converting between different counted string types, you may be tempted to use the null-terminated string as the intermediary.<\/p>\n<pre>\/\/ widget.GetName() returns a winrt::hstring,\r\n\/\/ but we want to manipulate it as a std::wstring\r\nstd::wstring name(widget.GetName().c_str());\r\n<\/pre>\n<p>Not only is there a performance penalty here, because the <code>std::<wbr \/>wstring<\/code> constructor has to go look for the terminating null character, but there is also a security vulnerability: If an attacker puts an embedded null in a string, they might be able to sneak past a security check or validation.<\/p>\n<pre>bool IsAllowedName(std::wstring const&amp; name)\r\n{\r\n    return name == L\"alice\" || name == L\"bob\";\r\n}\r\n\r\nvoid ProcessWidget(Widget const&amp; widget)\r\n{\r\n    if (!IsAllowedName(widget.Name().c_str())) {\r\n        throw winrt::hresult_access_denied();\r\n    }\r\n\r\n    \u27e6 continue processing \u27e7\r\n}\r\n<\/pre>\n<p>An attacker could bypass the access check by using a widget whose name is <code>\"alice\\0haha\"<\/code>, and it will be considered to have an allowed name, since the embedded null causes the <code>std::<wbr \/>wstring<\/code> passed to <code>Is\u00adAllowed\u00adName()<\/code> to consist only of the characters leading up to the null terminator.<\/p>\n<p>As another example, you might want to print a <code>BSTR<\/code>, which is also a counted string type, although the representation is that of a pointer to the first <code>wchar_t<\/code>. This means that you can often pretend that a <code>BSTR<\/code> is a null-terminated string, but the danger is that any embedded null will cause you to stop processing the string before you get to the end.<\/p>\n<pre>void PrintBstr(BSTR bstr)\r\n{\r\n    std::cout &lt;&lt; bstr;\r\n}\r\n<\/pre>\n<p><b>Sidebar<\/b>: There&#8217;s another danger, namely that the <code>BSTR<\/code> might be <code>nullptr<\/code>, which represents a zero-length string. However, trying to <code>&lt;&lt;<\/code> a <code>(wchar_t*)nullptr<\/code> will crash because the <code>&lt;&lt;<\/code> operator will dereference the null pointer while searching for the null terminator.<\/p>\n<p>Okay, now that we&#8217;ve laid out the problem, we&#8217;ll look at solutions next time.<\/p>\n<p><b>Bonus chatter<\/b>:<\/p>\n<pre>\/\/ This also prints \"hello\"\r\nstd::string u = \"hello\\0world\";\r\nstd::cout &lt;&lt; u &lt;&lt; std::endl;\r\n<\/pre>\n<p><b>Bonus bonus chatter<\/b>: For the specific case of converting a <code>winrt::hstring<\/code> to a <code>std::wstring<\/code>, you can just pass the <code>winrt::hstring<\/code> and use the <code>std::<wbr \/>wstring_view<\/code> conversion constructor!<\/p>\n<pre>winrt::hstring h;\r\nstd::wstring w(h); \/\/ just construct it from the hstring\r\n<\/pre>\n<p>In our example, we would just pass the <code>winrt::hstring<\/code> to <code>Is\u00adAllowed\u00adName<\/code> and let the compiler do the conversion.<\/p>\n<pre>void ProcessWidget(Widget const&amp; widget)\r\n{\r\n    if (!IsAllowedName(widget.Name())) {\r\n        throw winrt::hresult_access_denied();\r\n    }\r\n\r\n    \u27e6 continue processing \u27e7\r\n}\r\n<\/pre>\n<p><b>Bonus bonus bonus chatter<\/b>: What if you are forced to produce a pointer to a null-terminated string due to some interop requirement?<\/p>\n<p>In that case, you should fail the operation if the counted string contains an embedded null. For <code>HSTRING<\/code>, you can use the <code>Windows\u00adString\u00adHas\u00adEmbedded\u00adNull<\/code> to check for an embedded null. The <code>Windows\u00adString\u00adHas\u00adEmbedded\u00adNull<\/code> function caches the result, so asking a second time uses the result calculated from the first time. Mind you, scanning a string for an embedded null is probably not that expensive, so the cache doesn&#8217;t buy you much, especially since you&#8217;re probably about to pass the string to another function that will consume it, so the contents of the string are going to be scanned by the consumer anyway.<\/p>\n<p>Arguably, the <code>c_str()<\/code> function should throw an exception if the counted string is not representable as a C-style null-terminated string. But what&#8217;s done is done. At best, we can make up a new method name, like <code>safe_c_str()<\/code>?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You&#8217;re throwing away perfectly good data, there.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-109915","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>You&#8217;re throwing away perfectly good data, there.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/109915","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=109915"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/109915\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=109915"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=109915"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=109915"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}