{"id":3243,"date":"2013-09-13T07:00:00","date_gmt":"2013-09-13T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2013\/09\/13\/how-does-interlockedincrement-work-internally\/"},"modified":"2013-09-13T07:00:00","modified_gmt":"2013-09-13T07:00:00","slug":"how-does-interlockedincrement-work-internally","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20130913-00\/?p=3243","title":{"rendered":"How does InterlockedIncrement work internally?"},"content":{"rendered":"<p>\nThe\n<a HREF=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms684122.aspx\">\nInterlocked<\/a>\nfamily of functions perform atomic operations\non memory.\nHow do they do it?\n<\/p>\n<p>\nIt depends on the underlying CPU architecture.\nFor some CPUs, it&#8217;s easy:\nThe x86, for example, has direct support for many interlocked operations\nby means of the\n<a HREF=\"http:\/\/www.cs.ucla.edu\/~kohler\/class\/04f-aos\/ref\/i386\/LOCK.htm\">\n<code>LOCK<\/code> prefix<\/a>\n(with the bonus feature that <code>LOCK<\/code> is implied for the\n<code>XCHG<\/code> opcode.)\nThe ia64 and x64 also have\ndirect support for atomic load-modify-store operations.\n<\/p>\n<p>\nMost other architectures break the operation into two parts,\nknown as\n<a HREF=\"http:\/\/en.wikipedia.org\/wiki\/Load-Link\/Store-Conditional\">\nLoad-link\/store-conditional<\/a>.\nThe first part (load-link) reads a value from memory and instructions\nthe processor to monitor the memory address to see if any other\nprocessors modify that same memory.\nThe second part (store-conditional) stores a value to memory\nprovided that no other processors have written to the memory\nin the meantime.\nAn atomic load-modify-store operation is therefore performed by\nreading the value via load-link,\nperforming the desired computation,\nthen\nattempting a store-conditional.\nIf the store-conditional fails,\nthen start all over again.\n<\/p>\n<pre>\nLONG InterlockedIncrement(LONG volatile *value)\n{\n  LONG lOriginal, lNewValue;\n  do {\n    \/\/ Read the current value via load-link so we will know if\n    \/\/ somebody has modified it while we weren't looking.\n    lOriginal = load_link(value);\n    \/\/ Calculate the new value\n    lNewValue = lOriginal + 1;\n    \/\/ Store the value conditionally. This will fail if somebody\n    \/\/ has updated the value in the meantime.\n  } while (!store_conditional(value, lNewValue));\n  return lNewValue;\n}\n<\/pre>\n<p>\n(If this looks familiar, it should.\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2004\/09\/15\/229915.aspx\">\nYou&#8217;ve seen this pattern before<\/a>.)\n<\/p>\n<p>\nNow, asking the CPU to monitor a memory address comes with its own\ngotchas.\nFor one thing, the CPU can monitor only one memory address at a time,\nand its memory is very short-term.\nIf your code gets pre-empted or if a hardware interrupt comes in\nafter your <code>load_link<\/code>, then your\n<code>store_conditional<\/code> will fail because the CPU got distracted\nby the shiny object known as <i>hardware interrupt<\/i> and totally\nforgot about that memory address it was supposed to be monitoring.\n(Even if it managed to remember it, it won&#8217;t remember it for long,\nbecause the hardware interrupt will almost certainly execute its own\n<code>load_link<\/code> instruction, thereby replacing the monitored\naddress with its own.)\n<\/p>\n<p>\nFurthermore, the CPU might be a little sloppy in its monitoring\nand monitor not the address itself but\n<a HREF=\"http:\/\/drdobbs.com\/go-parallel\/article\/showArticle.jhtml?articleID=217500206\">\nthe cache line<\/a>.\nIf somebody modifies a different memory location which happens to\nreside in the same cache line, the <code>store_conditional<\/code>\nmight fail even though you would expect it to succeed.\nThe ARM architecture allows a processor to be so sloppy that\nany write in the same block of 2048 bytes can cause a\n<code>store_conditional<\/code> to fail.\n<\/p>\n<p>\nWhat this means for you, the assembly-language coder who is\nimplementing an interlocked operation, is that you need to minimize\nthe number of instructions between the\n<code>load_link<\/code> and <code>store_conditional<\/code>.\nFor example,\n<code>InterlockedIncrement<\/code> merely adds&nbsp;1 to the value.\nThe more instructions you insert between the\n<code>load_link<\/code> and <code>store_conditional<\/code>,\nthe greater the chance that your <code>store_conditional<\/code> will fail\nand you will have to retry.\nAnd if you put too much code in between, your\n<code>store_conditional<\/code> will <i>never<\/i> succeed.\nAs an extreme example, if you put code that\ntakes five seconds to calculate the new value,\nyou will certainly receive several hardware interrupts during those\nfive seconds, and your\n<code>store_conditional<\/code> will always fail.\n<\/p>\n<p>\n<b>Bonus reading<\/b>:\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2004\/05\/06\/127141.aspx\">\nWhy did InterlockedIncrement\/Decrement only return the sign of the result<\/a>?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Interlocked family of functions perform atomic operations on memory. How do they do it? It depends on the underlying CPU architecture. For some CPUs, it&#8217;s easy: The x86, for example, has direct support for many interlocked operations by means of the LOCK prefix (with the bonus feature that LOCK is implied for the XCHG [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-3243","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>The Interlocked family of functions perform atomic operations on memory. How do they do it? It depends on the underlying CPU architecture. For some CPUs, it&#8217;s easy: The x86, for example, has direct support for many interlocked operations by means of the LOCK prefix (with the bonus feature that LOCK is implied for the XCHG [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/3243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=3243"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/3243\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=3243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=3243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=3243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}