{"id":106061,"date":"2021-12-29T07:00:00","date_gmt":"2021-12-29T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106061"},"modified":"2021-12-17T21:37:05","modified_gmt":"2021-12-18T05:37:05","slug":"20211229-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20211229-00\/?p=106061","title":{"rendered":"You can&#8217;t copy code with memcpy; code is more complicated than that"},"content":{"rendered":"<p>Back in the day, a customer reported that their program crashed on <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150727-00\/?p=90821\"> Itanium<\/a>.<\/p>\n<p>Wait, come back!<\/p>\n<p>Itanium is where the customer recognized the problem, but it applies to all other architectures, so stick with me.<\/p>\n<p>Their code went roughly like this:<\/p>\n<pre>struct REMOTE_THREAD_INFO\r\n{\r\n    int data1;\r\n    int data2;\r\n    int data3;\r\n};\r\n\r\nstatic DWORD CALLBACK RemoteThreadProc(REMOTE_THREAD_INFO* info)\r\n{\r\n    try {\r\n        ... use the info to do something ...\r\n    } catch (...) {\r\n        ... ignore all exceptions ...\r\n    }\r\n    return 0;\r\n}\r\nstatic void EndOfRemoteThreadProc()\r\n{\r\n}\r\n\r\n\/\/ Error checking elided for expository purposes\r\nvoid DoSomethingCrazy()\r\n{\r\n    \/\/ Calculate the number of code bytes.\r\n    SIZE_T functionSize = (BYTE*)EndOfRemoteThreadProc - (BYTE*)RemoteThreadProc;\r\n\r\n    \/\/ Allocate memory in the remote process\r\n    SIZE_T allocSize = sizeof(REMOTE_THREAD_INFO) + functionSize;\r\n    REMOTE_THREAD_INFO* buffer = (REMOTE_THREAD_INFO*)\r\n      VirtualAllocEx(targetProcess, NULL, allocSize, MEM_COMMIT,\r\n        PAGE_EXECUTE_READWRITE);\r\n\r\n    \/\/ Write data to the remote process\r\n    REMOTE_THREAD_INFO localInfo = { ... };\r\n    WriteProcessMemory(targetProcess, buffer,\r\n                       &amp;localInfo, sizeof(localInfo));\r\n\r\n    \/\/ Write code to the remote process\r\n    WriteProcessMemory(targetProcess, buffer + 1,\r\n                       (void*)RemoteThreadProc, functionSize);\r\n\r\n    \/\/ Execute it!\r\n    CreateRemoteThread(targetProcess, NULL, 0,\r\n                       (LPTHREAD_START_ROUTINE)(buffer + 1),\r\n                       buffer);\r\n}\r\n<\/pre>\n<p>This code is such a bad idea, I&#8217;ve intentionally introduced errors so it won&#8217;t even compile.<\/p>\n<p>The idea is that they want to inject some code into a target process, so they use <code>Virtual\u00adAlloc<\/code> to allocate some memory in that process. The first part of the memory block contains some data that they want to pass. The second part of the memory block contains the code bytes that they want to execute, and they tell <code>Create\u00adRemote\u00adThread<\/code> execution at those code bytes.<\/p>\n<p>I&#8217;m just going to say it right now: The entire idea that went into this code is fundamentally flawed.<\/p>\n<p>The customer reported that this code &#8220;worked just fine on 32-bit x86 and 64-bit x86&#8221;, but it doesn&#8217;t work on Itanium.<\/p>\n<p>Actually, I&#8217;m surprised that it worked even on x86!<\/p>\n<p>The design assumes that all of the code in <code>RemoteThreadProc<\/code> is position-independent. There is no requirement that generated code be position-independent. For example, one code generation option for <code>switch<\/code> statements is to use a jump table, and that jump table consists of absolute addresses on x86.<\/p>\n<p>In fact, it&#8217;s clear that the code <i>isn&#8217;t<\/i> position-independent, because it&#8217;s using C++ exception handling, and the Microsoft compiler&#8217;s implementation of exception handling involves a table that maps points of execution to <code>catch<\/code> statements, so that it knows which <code>catch<\/code> statement to use. And if they had used a filtered <code>catch<\/code>, then there would be additional tables for deciding whether the <code>catch<\/code> filter applies to the exception that was thrown.<\/p>\n<p>The design also assumes that the code contains no references to anything outside the function body itself. All of the jump tables and lookup tables used by the function need to be copied to the target process, and the code assumes that those tables are also between the labels <code>EndOfRemoteThreadProc<\/code> <code>RemoteThreadProc<\/code>.<\/p>\n<p>Indeed, we know that there will be references to content outside the function body itself, because the C++ try\/catch block will call into functions in the C runtime support library.<\/p>\n<p>Both x86-64 and Itanium use unwind codes for exception handling, and there was no attempt to register those unwind codes in the target process.<\/p>\n<p>My guess is that they were lucky and no exceptions were thrown, or at least they were thrown infrequently enough that it eluded their testing.<\/p>\n<p>There is also no guarantee that <code>EndOfRemoteThreadProc<\/code> will be placed directly after <code>RemoteThreadProc<\/code> in memory. Indeed, there&#8217;s not even a guarantee that <code>EndOfRemoteThreadProc<\/code> will have an independent existent. The linker may perform <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20050322-00\/?p=36113\"> COMDAT folding<\/a>, which causes multiple identical functions to be combined into one. Even if you disable COMDAT folding, <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build\/profile-guided-optimizations?view=msvc-160\"> Profile-Guided Optimization<\/a> will move the functions independently, and they are unlikely to end up in the same place.<\/p>\n<p>Indeed, there&#8217;s no requirement that the code bytes for the <code>RemoteThreadProc<\/code> function be contiguous at all! Profile-Guided Optimization will rearrange basic blocks, and the code for a single function may end up scattered across different parts of the program, depending on their usage patterns.<\/p>\n<p>Even without Profile-Guided Optimization, compile-time optimization may inline some or all of a function, so a single function might have multiple copies in memory, each of which has been optimized for its specific call site.<\/p>\n<p>There are also some Itanium-specific rules that ensure abject failure on Itanium.<\/p>\n<p>On Itanium, all instructions must be aligned on 16-byte boundaries, but the above code does not respect that. Also, on Itanium, function pointers point not to the first code byte, but to <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150731-00\/?p=90771\"> a descriptor structure that contains a pair of pointers, one to the functions <code>gp<\/code>, and the other to the first byte of code<\/a>. (This is <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180816-00\/?p=99505\"> the same pattern used by PowerPC<\/a>.)<\/p>\n<p>I pointed out to the customer liaison that what the customer is trying to do is very suspicious and looks like a virus. The customer liaison explained that it&#8217;s quite the opposite: The customer is a major anti-virus software vendor! The customer has important functionality in their product that that they have built based on this technique of remote code injection, and they cannot afford to give it up at this point.<\/p>\n<p>Okay, now I&#8217;m scared.<\/p>\n<p>A safer\u00b9 way to inject code into a process is to load the code as a library, via <code>Load\u00adLibrary<\/code>. This invokes the loader, which will do the work of applying fixups as necessary, allocating all the memory in the appropriate way, with the correct alignment, registering control flow guard and exception unwind tables, loading dependent libraries, and generally getting the execution environment set up properly to run the desired code.<\/p>\n<p>We never heard back from the customer.<\/p>\n<p>\u00b9 I didn&#8217;t say it was a <i>safe<\/i> way to inject code. Just that it was <i>safer<\/i>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There&#8217;s more to code than just instructions.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-106061","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>There&#8217;s more to code than just instructions.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106061"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106061\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}