{"id":97095,"date":"2017-09-27T07:00:00","date_gmt":"2017-09-27T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=97095"},"modified":"2019-03-13T01:17:34","modified_gmt":"2019-03-13T08:17:34","slug":"20170927-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20170927-00\/?p=97095","title":{"rendered":"How to check if a pointer is in a range of memory"},"content":{"rendered":"<p>Suppose you have a range of memory described by two variables, say, <\/p>\n<pre>\nbyte* regionStart;\nsize_t regionSize;\n<\/pre>\n<p>And suppose you want to check whether a pointers lies within that region. You might be tempted to write <\/p>\n<pre>\nif (p &gt;= regionStart &amp;&amp; p &lt; regionStart + regionSize)\n<\/pre>\n<p>but is this actually guaranteed according to the standard? <\/p>\n<p>The relevant portion of the C standard (6.5.8 Relational Operators)&sup1; says <\/p>\n<blockquote CLASS=\"q\"><p>If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined. <\/p><\/blockquote>\n<p>Now remember that the C language was defined to cover a large range of computer architectures, including many which would be considered museum relics today. It therefore takes a very conservative view of what is permitted, so that it remains possible to write C programs for those ancient systems. (Which weren&#8217;t quite so ancient at the time.) <\/p>\n<p>Bearing that in mind, it is still possible for an allocation to generate a pointer that satisfies the condition despite the pointer not pointing into the region. This will happen, for example, on an 80286 in protected mode, which is used by Windows 3.x in Standard mode and OS\/2 1.x. <\/p>\n<p>In this system, pointers are 32-bit values, split into two 16-bit parts, traditionally written as <code>XXXX:YYYY<\/code>. The first 16-bit part (<code>XXXX<\/code>) is the &#8220;selector&#8221;, which chooses a bank of 64<a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20090611-00\/?p=17933\">KB<\/a>. The second 16-bit part (<code>YYYY<\/code>) is the &#8220;offset&#8221;, which chooses a byte within that 64KB bank. (It&#8217;s more complicated than this, but let&#8217;s just leave it at that for the purpose of this discussion.) <\/p>\n<p>Memory blocks larger than 64KB are broken up into 64KB chunks. To move from one chunk to the next, you add 8 to the selector. For example, the byte after <code>0101:FFFF<\/code> is <code>0109:0000<\/code>. <\/p>\n<p>But why do you add 8 to move to the next selector? Why not just increment the selector? Because the bottom three bits of the selector are used for other things. In particular, the bottom bit of the selector is used to choose the selector table. Let&#8217;s ignore bits 1 and 2 since they are not relevant to the discussion. Assume for convenience that they are always zero.&sup2; <\/p>\n<p>There are two tables which describe how selectors correspond to physical memory, the Global Descriptor Table (for memory shared across all processes) and the Local Descriptor Table (for memory private to a single process). Therefore, the selectors available for process private memory are <code>0001<\/code>, <code>0009<\/code>, <code>0011<\/code>, <code>0019<\/code>, <i>etc<\/i>. Meanwhile, the selectors available for global memory are <code>0008<\/code>, <code>0010<\/code>, <code>0018<\/code>, <code>0020<\/code>, <i>etc<\/i>. (Selector <code>0000<\/code> is reserved.) <\/p>\n<p>Okay, now we can set up our counter-example. Suppose <code>regionStart = 0101:0000<\/code> and <code>regionSize = 0x00020000<\/code>. This means that the guarded addresses are <code>0101:0000<\/code> through <code>0101:FFFF<\/code> and <code>0109:0000<\/code> through <code>0109:FFFF<\/code>. Furthermore, <code>regionStart + regionSize = 0111:0000<\/code>. <\/p>\n<p>Meanwhile, suppose there is some global memory that happens to be allocated at <code>0108:0000<\/code>. This is a global memory allocation because the selector is an even number. <\/p>\n<p>Observe that the global memory allocation is not part of the guarded region, but its pointer value does satisfy the numeric inequality <code>0101:0000<\/code> &le; <code>0108:0000<\/code> &lt; <code>0111:0000<\/code>. <\/p>\n<p><b>Bonus chatter<\/b>: Even on CPU architectures with a flat memory model, the test can fail. Modern compilers take advantage of undefined behavior and optimize accordingly. If they see a relational comparison between pointers, they are permitted to assume that the pointers point into the same aggregate or array (or one past the last element of that array), because any other type of relational comparison is undefined. Specifically, if <code>regionStart<\/code> points to the start of an array or aggregate, then the only pointers that can legally be relationally compared with <code>regionStart<\/code> are the ones of the form <code>regionStart<\/code>, <code>regionStart + 1<\/code>, <code>regionStart + 2<\/code>, &hellip;, <code>regionStart + regionSize<\/code>. For all of these pointers, the condition <code>p &gt;= regionStart<\/code> is true and can therefore be optimized out, reducing the test to <\/p>\n<pre>\nif (p &lt; regionStart + regionSize)\n<\/pre>\n<p>which will now be satisfied for pointers that are numerically less than <code>regionStart<\/code>. <\/p>\n<p>(You might run into this scenario if, as in the original question that inspired this answer, you allocated the region with <code>regionStart = malloc(n)<\/code>, or if your region is a &#8220;quick access&#8221; pool of preallocated items and you want to decide whether you need to <code>free<\/code> the pointer.) <\/p>\n<p><b>Moral of the story<\/b>: This code is not safe, not even on flat architectures. <\/p>\n<p><b>But all is not lost<\/b>: The pointer-to-integer conversion is implementation-defined, which means that your implementation must document how it works. If your implementation defines the pointer-to-integer conversion as producing the numeric value of the linear address of the object referenced by the pointer, and you know that you are on a flat architecture, then what you can do is compare <i>integers<\/i> rather than <i>pointers<\/i>. Integer comparisons are not constrained in the same way that pointer comparisons are. <\/p>\n<pre>\n    if ((uintptr_t)p &gt;= (uintptr_t)regionStart &amp;&amp;\n        (uintptr_t)p &lt; (uintptr_t)regionStart + (uintptr_t)regionSize)\n<\/pre>\n<p>&sup1; Note that comparison for equality and inequality are not considered relational comparisons. <\/p>\n<p>&sup2; I know that in practice they aren&#8217;t. I&#8217;m assuming they are zero for convenience. <\/p>\n<p>(This article was adapted from <a HREF=\"https:\/\/stackoverflow.com\/questions\/39160613\/can-the-following-code-be-true-for-pointers-to-different-things\">my answer on StackOverflow<\/a>.) <\/p>\n<p><b>Update<\/b>: Clarification that the &#8220;start of region&#8221; optimization is available only when <code>regionStart<\/code> points to the start of an array or aggregate. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Thanks to the C language standard, it&#8217;s trickier than it seems.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-97095","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Thanks to the C language standard, it&#8217;s trickier than it seems.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/97095","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=97095"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/97095\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=97095"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=97095"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=97095"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}