{"id":10933,"date":"2011-04-14T07:00:00","date_gmt":"2011-04-14T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2011\/04\/14\/lock-free-algorithms-the-opportunistic-cache\/"},"modified":"2011-04-14T07:00:00","modified_gmt":"2011-04-14T07:00:00","slug":"lock-free-algorithms-the-opportunistic-cache","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20110414-00\/?p=10933","title":{"rendered":"Lock-free algorithms: The opportunistic cache"},"content":{"rendered":"<p>\nSuppose profiling reveals that a specific calculation is responsible\nfor a significant portion of your CPU time,\nand instrumentation says that most of the time, it&#8217;s just being\nasked to calculate the same thing over and over.\nA simple one-level cache would do the trick here.\n<\/p>\n<pre>\nBOOL IsPrime(int n)\n{\n static int nLast = 1;\n static BOOL fLastIsPrime = <a HREF=\"http:\/\/mathforum.org\/dr.math\/faq\/faq.prime.num.html\">FALSE<\/a>;\n \/\/ if it's the same value as last time, then\n \/\/ use the answer we cached from last time\n if (n == nLast) return fLastIsPrime;\n \/\/ calculate and cache the new result\n nLast = n;\n fLastIsPrime = slow_IsPrime(n);\n return fLastIsPrime;\n}\n<\/pre>\n<p>\nOf course, this isn&#8217;t thread-safe, because if one thread\nis pre-empted inside the call to <code>slow_IsPrime<\/code>,\nthen another thread will see\nvalues for <code>nLast<\/code> and <code>fLast&shy;Is&shy;Prime<\/code>\nthat do not correspond to each other.\n<\/p>\n<p>\nOne solution would be to put a critical section around this code,\nbut this introduces an artificial bottleneck:\nIf the most recent cached result is <code>nLast = 5<\/code>,\n<code>fLast&shy;Is&shy;Prime = TRUE<\/code>,\nand if two threads both try to see whether 5 is prime,\nyou don&#8217;t want those two threads to serialize against each other.\n<\/p>\n<p>\nAnother solution is to use\n<a HREF=\"http:\/\/msdn.microsoft.com\/en-us\/library\/aa904937.aspx\">\nslim reader-writer locks<\/a> and acquire in shared mode when\nchecking the cache and in exclusive mode when updating the cache.\nBut let&#8217;s try a lock-free solution.\n<\/p>\n<p>\nWe&#8217;re going to combine two different techniques.\nFirst, we use the change counter technique\nwe saw last time when\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2011\/04\/12\/10152296.aspx\">\nwe investigated try\/commit\/(try again)<\/a>,\nbut we also combine it with a lock that is manipulated\nwith a try\/commit\/abandon pattern.\n<\/p>\n<pre>\n#define IsLocked(l) ((l) &amp; 1)\nBOOL IsPrime(int n)\n{\n static int nLast = 1;\n static BOOL fLastIsPrime = FALSE;\n static LONG lCounter = 0;\n \/\/ see if we can get it from the cache\n LONG lCounterStart = InterlockedReadAcquire(&amp;lCounter, -1);\n if (!IsLocked(lCounterStart) &amp;&amp; n == nLast) {\n  BOOL fResult = fLastIsPrime;\n  if (InterlockedReadRelease(&amp;lCounter, -1) == lCounterStart)\n   return fResult;\n }\n \/\/ calculate it the slow way\n BOOL fIsPrime = slow_IsPrime(n);\n \/\/ update the cache if we can\n lCounterStart = lCounter;\n if (!IsLocked(lCounterStart) &amp;&amp;\n     InterlockedCompareExchangeAcquire(&amp;lCounter,\n              lCounterStart+1, lCounterStart) == lCounterStart) {\n  nLast = n;\n  fLastIsPrime = fIsPrime;\n  InterlockedCompareExchangeRelease(&amp;lCounter,\n              lCounterStart+2, lCounterStart+1);\n }\n return fIsPrime;\n}\n<\/pre>\n<p>\nThe <code>lCounter<\/code> consists of\na LOCK bit as the bottom bit\nand a change counter in the remaining bits.\n(Choosing the bottom bit as the LOCK bit makes the operations\nof clearing the lock and incrementing the counter very simple.)\n<\/p>\n<p>\nThere are two parts to this function, the part that reads the\ncache and the part that updates the cache.\n<\/p>\n<p>\nTo read the cache, we first read the counter with acquire semantics,\nso that the reads of <code>nLast<\/code> and\n<code>fLast&shy;Is&shy;Prime<\/code> will not take place until after we\nget the counter.\nIf the counter says that the cache is not locked, then we go\nahead and fetch the last value and the last result.\nIf the last value in the cache matches the value we&#8217;re calculating,\nthen we go ahead and use the last result.\nBut as a final check, we make sure that the counter hasn&#8217;t changed\nwhile we were busy looking at the protected variables.\nIf it has, then it means that we may have read inconsistent values\nand cannot trust the cached result.\n<\/p>\n<p>\nIf we have a cache miss or we couldn&#8217;t access the cache,\nwe go ahead and calculate the result the slow way.\n<\/p>\n<p>\nNext, we try to update the cache.\nThis time, instead of just looking to see whether the cache\nis locked, we try to lock it ourselves by setting the bottom bit.\n(If the lock fails, then we skip the cache update and just return\nthe value we calculated.)\nOnce the lock is taken, we update the protected variables,\nthen atomically\nrelease the lock and increment the counter.\n(This is where putting the lock in the bottom bit comes in handy:\nYou can increment the counter by adding 2 and not worry about\na carry out of the counter bits turning into an accidental lock bit.)\nWe use Release semantics so that the\nvalues of the protected values are committed to memory before\nlock bit clears.\n<\/p>\n<p>\nNote that in both halves of the function, if the cache is locked,\nwe just proceed as if there were no cache at all.\nThe theory here is that it&#8217;s better just to say\n&#8220;Oh, the heck with it, I&#8217;ll just do it myself&#8221;\nthan to line up and wait to access the cache.\nContinuing instead of waiting avoids problems like priority inversion,\nbut it also means that you get some spurious cache misses.\nFortunately, since it&#8217;s just a cache, an occasional spurious miss\nis not the end of the world.\n<\/p>\n<p>\nYou could do something similar with the\n<code>Try&shy;Enter&shy;Critical&shy;Section<\/code> function\n<a HREf=\"http:\/\/www.microsoft.com\/msj\/archive\/S413.aspx\">\nprovided you&#8217;re running Windows&nbsp;NT&nbsp;4.0 or higher<\/a>:\n<\/p>\n<pre>\nBOOL IsPrime(int n)\n{\n static int nLast = 1;\n static BOOL fLastIsPrime = FALSE;\n BOOL fHaveAnswer = FALSE;\n BOOL fIsPrime;\n \/\/ see if we can get it from the cache\n if (TryEnterCriticalSection(&amp;g_cs)) {\n  if (n == nLast) {\n   fHaveAnswer = TRUE;\n   fIsPrime = fLastIsPrime;\n  }\n  LeaveCriticalSection(&amp;g_cs);\n }\n if (fHaveAnswer) return fIsPrime;\n \/\/ calculate it the slow way\n fIsPrime = slow_IsPrime(n);\n \/\/ update the cache if we can\n if (TryEnterCriticalSection(&amp;g_cs)) {\n  nLast = n;\n  fLastIsPrime = fIsPrime;\n  LeaveCriticalSection(&amp;g_cs);\n }\n return fIsPrime;\n}\n<\/pre>\n<p>\nThis does have the disadvantage that multiple readers will lock\neach other out, so we can switch to a slim reader\/writer lock\n<a HREF=\"http:\/\/msdn.microsoft.com\/en-us\/library\/dd405529(v=vs.85).aspx\">\nprovided we&#8217;re running on Window&nbsp; 7 or higher<\/a>:\n<\/p>\n<pre>\nBOOL IsPrime(int n)\n{\n static int nLast = 1;\n static BOOL fLastIsPrime = FALSE;\n BOOL fHaveAnswer = FALSE;\n BOOL fIsPrime;\n \/\/ see if we can get it from the cache\n if (TryAcquireSRWLockShared(&amp;g_lock)) {\n  if (n == nLast) {\n   fHaveAnswer = TRUE;\n   fIsPrime = fLastIsPrime;\n  }\n  ReleaseSRWLockShared(&amp;g_lock);\n }\n if (fHaveAnswer) return fIsPrime;\n \/\/ calculate it the slow way\n fIsPrime = slow_IsPrime(n);\n \/\/ update the cache if we can\n if (TryAcquireSRWLockExclusive(&amp;g_lock)) {\n  nLast = n;\n  fLastIsPrime = fIsPrime;\n  LeaveSRWLockExclusive(&amp;g_lock);\n }\n return fIsPrime;\n}\n<\/pre>\n<p>\nThis still has the problem that readers can lock out a cache update.\nIf the function is hot (and if it weren&#8217;t, why would you switch\nto a lock-free algorithm?), and the usage pattern shifts (say,\ninstead of checking whether 13 is prime over and over, it starts\nchecking whether 17 is prime over and over),\neverybody will be so busy reading the cache to see if the cached\nvalue is 17 that nobody will get a chance to update the cache\nto actually <i>be<\/i> 17!\n<\/p>\n<p>\n<b>Exercise<\/b>:\nWhat constraints must be imposed on the protected variables\nfor this technique to be successful?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Suppose profiling reveals that a specific calculation is responsible for a significant portion of your CPU time, and instrumentation says that most of the time, it&#8217;s just being asked to calculate the same thing over and over. A simple one-level cache would do the trick here. BOOL IsPrime(int n) { static int nLast = 1; [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-10933","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Suppose profiling reveals that a specific calculation is responsible for a significant portion of your CPU time, and instrumentation says that most of the time, it&#8217;s just being asked to calculate the same thing over and over. A simple one-level cache would do the trick here. BOOL IsPrime(int n) { static int nLast = 1; [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/10933","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=10933"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/10933\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=10933"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=10933"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=10933"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}