{"id":42223,"date":"2003-10-08T07:00:00","date_gmt":"2003-10-08T14:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2003\/10\/08\/why-is-address-space-allocation-granularity-64k\/"},"modified":"2025-10-02T13:26:09","modified_gmt":"2025-10-02T20:26:09","slug":"20031008-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20031008-00\/?p=42223","title":{"rendered":"Why is address space allocation granularity 64KB?"},"content":{"rendered":"<p>You may have wondered why VirtualAlloc allocates memory at 64KB boundaries even though page granularity is 4KB.<\/p>\n<p>You have RISC processors like the Alpha AXP to thank for that.<\/p>\n<p>RISC processors typically lack a &#8220;load 32-bit integer immediate&#8221; instruction. To load a 32-bit integer, you actually load two 16-bit integers and combine them. <b>Added<\/b>: For example, <a title=\"The ARM processor (Thumb-2), part 4: Single-instruction constants\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210603-00\/?p=105276\"> ARM has the <code>movt<\/code> instruction for moving a 16-bit constant into the upper 16 bits of a 32-bit register<\/a>, leaving the lower 16 bits unchanged. <a title=\"\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180809-00\/?p=99455\"> PowerPC uses <code>addis<\/code> to add a 16-bit constant to the upper 16 bits of a 32-bit register<\/a>. <a title=\"The MIPS R4000, part 4: Constants\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180405-00\/?p=98445\"> MIPS uses <code>LUI<\/code> to load a 16-bit value into the upper 16 bits of a 32-bit register<\/a>, zeroing out the lower 16 bits.<\/p>\n<p>So if allocation granularity were finer than 64KB, a DLL that got relocated in memory would require two fixups per relocatable address: one to the upper 16 bits and one to the lower 16 bits. And things get worse if this changes a carry or borrow between the two halves. (For example, if an address shifts by 4KB from <code>0x1234F000<\/code> to <code>0x12350000<\/code>, this forces both the low and high parts of the address to change. Even though the amount of motion was far less than 64KB, it still had an impact on the high part due to the carry.)<\/p>\n<p>But wait, there&#8217;s more.<\/p>\n<p>The Alpha AXP actually combines two <i>signed<\/i> 16-bit integers to form a 32-bit integer. For example, to load the value <code>0x1234ABCD<\/code>, you would first use the <code>LDAH<\/code> instruction to load the value <code>0x1235<\/code> into the high word of the destination register. Then you would use the <code>LDA<\/code> instruction to add the signed value <code>-0x5433<\/code>. (Since <code>0x5433<\/code> = <code>0x10000<\/code> \u2212 <code>0xABCD<\/code>.) The result is then the desired value of <code>0x1234ABCD<\/code>.<\/p>\n<pre>LDAH t1, 0x1235(zero) \/\/ t1 = 0x12350000\r\nLDA  t1, -0x5433(t1)  \/\/ t1 = t1 - 0x5433 = 0x1234ABCD\r\n<\/pre>\n<p>So if a relocation caused an address to move between the &#8220;lower half&#8221; of a 64KB block and the &#8220;upper half&#8221;, additional fixing-up would have to be done to ensure that the arithmetic for the top half of the address was adjusted properly. Since compilers like to reorder instructions, that <code>LDAH<\/code> instruction could be far, far away, so the relocation record for the bottom half would have to have some way of finding the matching top half.<\/p>\n<p>What&#8217;s more, the compiler is clever and if it needs to compute addresses for two variables that are in the same 64KB region, it shares the <code>LDAH<\/code> instruction between them. If it were possible to relocate by a value that wasn&#8217;t a multiple of 64KB, then the compiler would no longer be able to perform this optimization since it&#8217;s possible that after the relocation, the two variables no longer belonged to the same 64KB block.<\/p>\n<p>Forcing memory allocations at 64KB granularity solves all these problems.<\/p>\n<p>If you have been paying really close attention, you&#8217;d have seen that this also explains why there is a 64KB &#8220;no man&#8217;s land&#8221; near the 2GB boundary. Consider the method for computing the value <code>0x7FFFABCD<\/code>: Since the lower 16 bits are in the upper half of the 64KB range, the value needs to be computed by subtraction rather than addition. The na\u00efve solution would be to use<\/p>\n<pre>LDAH t1, 0x8000(zero) \/\/ t1 = 0x80000000, right?\r\nLDA  t1, -0x5433(t1)  \/\/ t1 = t1 - 0x5433 = 0x7FFFABCD, right?\r\n<\/pre>\n<p>Except that this doesn&#8217;t work. The Alpha AXP is a 64-bit processor, and <code>0x8000<\/code> does not fit in a 16-bit signed integer, so you have to use <code>-0x8000<\/code>, a negative number. What actually happens is<\/p>\n<pre>LDAH t1, -0x8000(zero) \/\/ t1 = 0xFFFFFFFF`80000000\r\nLDA  t1, -0x5433(t1)   \/\/ t1 = t1 - 0x5433 = 0xFFFFFFFF`7FFFABCD\r\n<\/pre>\n<p>You need to add a third instruction to clear the high 32 bits. The clever trick for this is to add zero and tell the processor to treat the result as a 32-bit integer and sign-extend it to 64 bits.<\/p>\n<pre>ADDL t1, zero, t1    \/\/ t1 = t1 + 0, with L suffix\r\n\/\/ L suffix means sign extend result from 32 bits to 64\r\n                     \/\/ t1 = 0x00000000`7FFFABCD\r\n<\/pre>\n<p>If addresses within 64KB of the 2GB boundary were permitted, then every memory address computation would have to insert that third <code>ADDL<\/code> instruction just in case the address got relocated to the &#8220;danger zone&#8221; near the 2GB boundary.<\/p>\n<p>This was an awfully high price to pay to get access to that last 64KB of address space (a 50% performance penalty for all address computations to protect against a case that in practice would never happen), so roping off that area as permanently invalid was a more prudent choice.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You have RISC processors like the Alpha AXP to thank for that.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-42223","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>You have RISC processors like the Alpha AXP to thank for that.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/42223","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=42223"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/42223\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=42223"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=42223"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=42223"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}