{"id":107069,"date":"2022-08-30T07:00:00","date_gmt":"2022-08-30T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=107069"},"modified":"2022-08-29T06:42:27","modified_gmt":"2022-08-29T13:42:27","slug":"20220830-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220830-00\/?p=107069","title":{"rendered":"The AArch64 processor (aka arm64), part 25: The ARM64EC ABI"},"content":{"rendered":"<p>I mentioned that Windows has a second ABI for AArch64 named ARM64EC. The &#8220;EC&#8221; stands for &#8220;Emulation Compatible&#8221;, and its purpose is to <a href=\"https:\/\/blogs.windows.com\/windowsdeveloper\/2021\/06\/28\/announcing-arm64ec-building-native-and-interoperable-apps-for-windows-11-on-arm\/\"> make it easier for ARM64 and x86-64 code to coexist within a single process<\/a>.<\/p>\n<p>The idea here is that you have a program written for x86-64, and you&#8217;re porting it to 64-bit ARM, but you can&#8217;t or don&#8217;t want to do a complete port. You might not be able to do a complete port because some of the libraries you&#8217;re using are available only for x86-64 and x86-32. And you may not want to do a complete port because the performance of the x86-64 emulator on 64-bit ARM systems is good enough for most of your usage scenarios, but there are a few performance-critical functions that you want to recompile as 64-bit ARM to avoid the emulation overhead. Or maybe your program has a plug-in model, and you want to be able to load plug-ins that were written for x86-64. Those plug-ins will run under emulation, but the rest of your program runs natively as AArch64.<\/p>\n<p>What you do is you port some of your program to 64-bit ARM and leave the rest in x86-64. The x86-64 parts run in the emulator, and the AArch64 parts run natively.<\/p>\n<p>The design of ARM64EC aligns the AArch64 conventions to match the x86-64 conventions, in order to minimize the mismatch at the architecture boundaries.<\/p>\n<p>One way to reduce the mismatch is to assign each AArch64 register a buddy x86-64 register. The AArch64 register uses its buddy&#8217;s slot in the <code>CONTEXT<\/code> structure, so that an x86-64 <code>CONTEXT<\/code> can be used to hold either an x86-64 context or an AArch64 context.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>AArch64<\/th>\n<th>x86-64<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>x0<\/var><\/td>\n<td><var>rcx<\/var><\/td>\n<td>Function parameter 1<\/td>\n<\/tr>\n<tr>\n<td><var>x1<\/var><\/td>\n<td><var>rdx<\/var><\/td>\n<td>Function parameter 2<\/td>\n<\/tr>\n<tr>\n<td><var>x2<\/var><\/td>\n<td><var>r8<\/var><\/td>\n<td>Function parameter 3<\/td>\n<\/tr>\n<tr>\n<td><var>x3<\/var><\/td>\n<td><var>r9<\/var><\/td>\n<td>Function parameter 4<\/td>\n<\/tr>\n<tr>\n<td><var>x4<\/var><\/td>\n<td><var>r10<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x5<\/var><\/td>\n<td><var>r11<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x6<\/var><\/td>\n<td><var>fp(1)<\/var><\/td>\n<td>Bottom 64 bits of fp(1)<\/td>\n<\/tr>\n<tr>\n<td><var>x7<\/var><\/td>\n<td><var>fp(2)<\/var><\/td>\n<td>Bottom 64 bits of fp(2)<\/td>\n<\/tr>\n<tr>\n<td><var>x8<\/var><\/td>\n<td><var>rax<\/var><\/td>\n<td>Return value<\/td>\n<\/tr>\n<tr>\n<td><var>x9<\/var><\/td>\n<td><var>fp(3)<\/var><\/td>\n<td>Bottom 64 bits of fp(3)<\/td>\n<\/tr>\n<tr>\n<td><var>x10<\/var><\/td>\n<td><var>fp(4)<\/var><\/td>\n<td>Bottom 64 bits of fp(4)<\/td>\n<\/tr>\n<tr>\n<td><var>x11<\/var><\/td>\n<td><var>fp(5)<\/var><\/td>\n<td>Bottom 64 bits of fp(5)<\/td>\n<\/tr>\n<tr>\n<td><var>x12<\/var><\/td>\n<td><var>fp(6)<\/var><\/td>\n<td>Bottom 64 bits of fp(6)<\/td>\n<\/tr>\n<tr>\n<td><var>x13<\/var><\/td>\n<td>&nbsp;<\/td>\n<td>Off-limits<\/td>\n<\/tr>\n<tr>\n<td><var>x14<\/var><\/td>\n<td>&nbsp;<\/td>\n<td>Off-limits<\/td>\n<\/tr>\n<tr>\n<td><var>x15<\/var><\/td>\n<td><var>fp(7)<\/var><\/td>\n<td>Bottom 64 bits of fp(7)<\/td>\n<\/tr>\n<tr>\n<td><var>x16<\/var><\/td>\n<td><var>fp(0..3)<\/var><\/td>\n<td>High 16 bits of fp(0) to f(3)<\/td>\n<\/tr>\n<tr>\n<td><var>x17<\/var><\/td>\n<td><var>fp(4..7)<\/var><\/td>\n<td>High 16 bits of fp(4) to f(7)<\/td>\n<\/tr>\n<tr>\n<td><var>x18<\/var><\/td>\n<td>&nbsp;<\/td>\n<td>TEB<\/td>\n<\/tr>\n<tr>\n<td><var>x19<\/var><\/td>\n<td><var>r12<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x20<\/var><\/td>\n<td><var>r13<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x21<\/var><\/td>\n<td><var>r14<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x22<\/var><\/td>\n<td><var>r15<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x23<\/var><\/td>\n<td>&nbsp;<\/td>\n<td>Off-limits<\/td>\n<\/tr>\n<tr>\n<td><var>x24<\/var><\/td>\n<td>&nbsp;<\/td>\n<td>Off-limits<\/td>\n<\/tr>\n<tr>\n<td><var>x25<\/var><\/td>\n<td><var>rsi<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x26<\/var><\/td>\n<td><var>rdi<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x27<\/var><\/td>\n<td><var>rbx<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x28<\/var><\/td>\n<td>&nbsp;<\/td>\n<td>Off-limits<\/td>\n<\/tr>\n<tr>\n<td><var>fp<\/var><\/td>\n<td><var>rbp<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>lr<\/var><\/td>\n<td><var>fp(0)<\/var><\/td>\n<td>Bottom 64 bits of fp(0)<\/td>\n<\/tr>\n<tr>\n<td><var>sp<\/var><\/td>\n<td><var>rsp<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>pc<\/var><\/td>\n<td><var>rip<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>Flags<\/td>\n<td>Flags<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>v0<\/var>..<var>v15<\/var><\/td>\n<td><var>xmm0<\/var>..<var>xmm15<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>v16<\/var>..<var>v31<\/var><\/td>\n<td>&nbsp;<\/td>\n<td>Off-limits<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>There are some sneaky tricks happening here.<\/p>\n<p>The classic 8087 floating point registers are 80-bit values, so they end up split into chunks. The lower-order 64 bits map to the buddy AArch64 register, and the upper 16 bits are gathered in groups of four to form a 64-bit value that gets stored in a helper register.<\/p>\n<p>The AArch64 integer register mappings are chosen so that they have the same register preservation policies as their x86-64 buddies. For example, <var>x19<\/var> is a preserved register in the classic ARM calling convention, and its buddy <var>r12<\/var> is a preserved register in the x86-64 calling convention.<\/p>\n<p>There are a few extra AArch64 registers that do not have an x86-64 buddy. These registers are off-limits to ARM64EC code. Do not use them, because their values are not preserved across context switches or asynchronous exceptions. (There&#8217;s nowhere to save them!)<\/p>\n<p>Notice that the classic AArch64 calling convention uses <var>r0<\/var> to hold both the first integer parameter as well as the return value, whereas x86-64 uses different registers for those two purposes. This means that the match is imperfect, and we&#8217;ll have to do some extra work later to get the return values to line up.<\/p>\n<p>Okay, so that aligns the registers. The next rule is that ARM64EC follows x86-64 data alignment rules. This makes structures binary-compatible between the two.<\/p>\n<p>A third rule is that when an ARM64EC function calls an x86-64 function or vice versa, the call goes through a &#8220;thunk&#8221; that manages the last bit of mismatch between the two architectures. For example, the exit thunk for returning from x86-64 code to AArch64 code will move <var>r8<\/var> (buddy to <var>rax<\/var>) to <var>r0<\/var>, so that the return value is in a place that AArch64 code expects.<\/p>\n<p>That&#8217;s the whirlwind tour of ARM64EC. There&#8217;s a lot more, but those are the parts you will notice when you&#8217;re debugging compiler-generated code. For even more details about ARM64EC, you can read <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/arm\/arm64ec-abi\"> Understanding Arm64EC ABI and assembly code<\/a> on docs.microsoft.com.<\/p>\n<p><b>Bonus chatter<\/b>: When you compile your code as ARM64EC, <a title=\"Getting to Know ARM64EC: #Defines and Intrinsic Functions\" href=\"https:\/\/techcommunity.microsoft.com\/t5\/windows-kernel-internals-blog\/getting-to-know-arm64ec-defines-and-intrinsic-functions\/ba-p\/2957235\"> the architecture preprocessor symbols will say that you are compiling for x86-64<\/a>. There&#8217;s a reason for this. See the linked article for the answer.<\/p>\n<p><b>Bonus reading<\/b>: <a title=\"Official Support for Arm64EC is Here \" href=\"https:\/\/devblogs.microsoft.com\/cppblog\/official-support-for-arm64ec-is-here\/\"> Arc64EC is now officially supported by the Microsoft Visual C++ compiler<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Aligning with the x86-64 calling convention.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-107069","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Aligning with the x86-64 calling convention.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107069","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=107069"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107069\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=107069"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=107069"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=107069"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}