{"id":35613,"date":"2005-05-18T08:56:42","date_gmt":"2005-05-18T08:56:42","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2005\/05\/18\/loading-the-dictionary-part-5-avoiding-string-copying\/"},"modified":"2005-05-18T08:56:42","modified_gmt":"2005-05-18T08:56:42","slug":"loading-the-dictionary-part-5-avoiding-string-copying","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20050518-42\/?p=35613","title":{"rendered":"Loading the dictionary, part 5:  Avoiding string copying"},"content":{"rendered":"<p>\nLooking at the profile for our program so far,\n35% of the CPU time is spent copying strings around.\nLet&#8217;s see if we can improve that.\n<\/p>\n<p>\nThe best way to speed up copying strings is not to copy them in\nthe first place.  Using a <code>wstring<\/code> in our\n<code>DictionaryEntry<\/code> structure forces the <code>vector<\/code>\nclass to copy the string data, when all we really need to copy\nis the pointer and size information.  The actual characters can stay\nput.  C++ has a copy constructor but not a &#8220;move&#8221; constructor.\n<\/p>\n<p>\nLet&#8217;s use plain string pointers rather than <code>wstring<\/code>\nobjects.  The &#8220;copy constructor&#8221; for a string pointer is just to\ncopy the pointer&mdash;exactly what we want here.\n<\/p>\n<pre>\nstruct DictionaryEntry\n{\n bool Parse(<font COLOR=\"blue\">const WCHAR* begin, const WCHAR* end<\/font>);\n <font COLOR=\"blue\">void Destruct() {\n  delete[] m_pszTrad;\n  delete[] m_pszSimp;\n  delete[] m_pszPinyin;\n  delete[] m_pszEnglish;\n }\n LPWSTR m_pszTrad;\n LPWSTR m_pszSimp;\n LPWSTR m_pszPinyin;\n LPWSTR m_pszEnglish;<\/font>\n};\n<\/pre>\n<p>\nThe <code>DictionaryEntry<\/code> is no longer a structure of\n<code>wstring<\/code>s but rather is just a structure of\n<code>LPWSTR<\/code>s, which copy much faster.\nThe cost for this, however, is having to free all the strings\nmanually in the dictionary destructor (which we will see below).\n<\/p>\n<p>\nSince we aren&#8217;t using <code>wstring<\/code>s any more, we need\nto allocate the memory for the strings and copy them the old fashioned way.\n<\/p>\n<pre>\n<font COLOR=\"blue\">LPWSTR AllocString(const WCHAR* begin, const WCHAR* end)\n{\n int cch = end - begin + 1;\n LPWSTR psz = new WCHAR[cch];\n lstrcpynW(psz, begin, cch);\n return psz;\n}<\/font>\nbool DictionaryEntry::Parse(\n       <font COLOR=\"blue\">const WCHAR* begin, const WCHAR* end<\/font>)\n{\n <font COLOR=\"blue\">const WCHAR* pch = std::find(begin, end, L' ');\n if (pch &gt;= end) return false;\n m_pszTrad = AllocString(begin, pch);\n begin = std::find(pch, end, L'[') + 1;\n if (begin &gt;= end) return false;\n pch = std::find(begin, end, L']');\n if (pch &gt;= end) return false;\n m_pszPinyin = AllocString(begin, pch);\n begin = std::find(pch, end, L'\/') + 1;\n if (begin &gt;= end) return false;\n for (pch = end; *--pch != L'\/'; ) { }\n if (begin &gt;= pch) return false;\n m_pszEnglish = AllocString(begin, pch);<\/font>\n return true;\n}\n<\/pre>\n<p>\nThere isn&#8217;t a <code>std::rfind<\/code> function, so I coded\nup a backwards-search-for-slash loop inline.\n<strong>Exercise<\/strong>: Why don&#8217;t I have to check that\n<code>pch<\/code> hasn&#8217;t underflowed beyond the beginning of the\nstring?\n<\/p>\n<pre>\nclass Dictionary\n{\npublic:\n Dictionary();\n <font COLOR=\"blue\">~Dictionary();<\/font>\n int Length() { return v.size(); }\n const DictionaryEntry&amp; Item(int i) { return v[i]; }\nprivate:\n vector&lt;DictionaryEntry&gt; v;\n};\nDictionary::Dictionary()\n{\n ...\n   if (cchResult){\n    <font COLOR=\"blue\"><strike>\/\/ wstring line(buf, cchResult);<\/strike><\/font>\n    DictionaryEntry de;\n    if (de.Parse(<font COLOR=\"blue\">buf, buf + cchResult<\/font>)) {\n     ...\n}\n<font COLOR=\"blue\">Dictionary::~Dictionary()\n{\n for (vector&lt;DictionaryEntry&gt;::iterator i = v.begin();\n      i != v.end(); i++) {\n  i-&gt;Destruct();\n }\n}<\/font>\n<\/pre>\n<p>\nThe last bits of the change are to get rid of the\ntemporary <code>wstring<\/code> we parsed from, since we don&#8217;t\nneed it any more, and to free all the strings in the\n<code>Dictionary<\/code>&#8216;s destructor.\n<\/p>\n<p>\nThis new program clocks in at 120ms\n(or 290ms if you include the time it takes to destroy the\ndictionary).\nAgain, we&#8217;re twice as fast as the previous version, but still\nhaven&#8217;t quite crossed the 100ms barrier.\nAnd the time to destroy the dictionary isn&#8217;t starting to be a\nmore significant factor.\nBut I have another trick up my sleeve.\n<\/p>\n<p>\n[Raymond is currently on vacation; this message was pre-recorded.]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Looking at the profile for our program so far, 35% of the CPU time is spent copying strings around. Let&#8217;s see if we can improve that. The best way to speed up copying strings is not to copy them in the first place. Using a wstring in our DictionaryEntry structure forces the vector class to [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-35613","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Looking at the profile for our program so far, 35% of the CPU time is spent copying strings around. Let&#8217;s see if we can improve that. The best way to speed up copying strings is not to copy them in the first place. Using a wstring in our DictionaryEntry structure forces the vector class to [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/35613","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=35613"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/35613\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=35613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=35613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=35613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}