{"id":30401,"date":"2022-04-26T15:00:14","date_gmt":"2022-04-26T15:00:14","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=30401"},"modified":"2022-05-19T14:52:36","modified_gmt":"2022-05-19T14:52:36","slug":"openmp-task-support-for-c-in-visual-studio","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/openmp-task-support-for-c-in-visual-studio\/","title":{"rendered":"OpenMP Task Support for C++ in Visual Studio"},"content":{"rendered":"<p>In our <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/openmp-updates-and-fixes-for-cpp-in-visual-studio-2019-16-10\/\">previous blog post<\/a> about OpenMP support in Visual Studio 2019 version 16.10, we announced support for the <strong>-openmp:llvm<\/strong> switch to enable the compiler to target LLVM\u2019s OpenMP runtime library on x86, x64, and arm64 platforms. In that release, <strong>-openmp:llvm<\/strong> included some correctness fixes and support for unsigned indices in parallel for loops, but otherwise only supported the same OpenMP 2.0 features as <strong>-openmp<\/strong>.<\/p>\n<h2>-openmp:llvm support for tasks<\/h2>\n<p>Starting with <a href=\"https:\/\/visualstudio.microsoft.com\/vs\/preview\/\">Visual Studio 2022 version 17.2<\/a>, we have added support for the first major new OpenMP feature specific to the <strong>-openmp:llvm<\/strong> flag: the <code>task<\/code> directive as defined by the OpenMP 3.1 standard, including the accompanying <code>if<\/code>, <code>private<\/code>, <code>firstprivate<\/code>, <code>shared<\/code>, <code>default<\/code>, <code>untied<\/code>, <code>mergeable<\/code>, and <code>final<\/code> clauses, and the associated <code>taskwait<\/code> and <code>taskyield<\/code> directives. The compiler does not yet support <code>task<\/code> clauses added in later versions of the OpenMP standard.<\/p>\n<p>The OpenMP <code>task<\/code> directive is used to specify a unit of work that can be done in parallel by any thread. A task executes once when a thread becomes available, instead of executing once per thread as a parallel region would. Creating tasks is a useful alternative to the OpenMP <code>parallel for<\/code> directive when the number of iterations is not known at the start of the calculation, such as when operating on a linked list or implementing a recursive algorithm.<\/p>\n<p>The following is a simple example of using the <code>task<\/code> directive to sum the elements of an array by splitting the array into pieces and having each task sum one piece.<\/p>\n<pre><code class=\"language-cpp\">#include &lt;stdio.h&gt;\r\n\r\nint sumArrayImpl(int* arr, int arrayLength) {\r\n  if (arrayLength == 1)\r\n     return arr[0];\r\n  if (arrayLength == 0)\r\n     return 0;\r\n\r\n  int left, right;\r\n  int halfLength = arrayLength \/ 2;\r\n  #pragma omp task default(none) firstprivate(arr, halfLength), shared(left) final(halfLength &gt;= 2)\r\n  {\r\n     left = sumArray(arr, halfLength);\r\n  }\r\n  #pragma omp task default(none) firstprivate(arr, halfLength, arrayLength) shared(right) final(halfLength &gt;= 2)\r\n  {\r\n     right = sumArray(arr + halfLength, halfLength + (arrayLength % 2));\r\n  }\r\n  #pragma omp taskwait\r\n     return left + right;\r\n}\r\n\r\nint sumArray(int* array, int arrayLength)\r\n   #pragma omp parallel\r\n   {\r\n      #pragma omp single\r\n      { \r\n         printf(\"Sum = %dn\", sumArrayImpl(array, arrayLength));\r\n      }\r\n   }\r\n}<\/code><\/pre>\n<p>Although this example does not do enough work for each element to see a speedup over a sequential for loop, and is consistent enough that a <code>parallel for<\/code> directive could accomplish the same degree of paralellism, it has a similar form to more complicated examples that would see a speedup and illustrates many of the <code>task<\/code> directive clauses.<\/p>\n<p>The <code>private<\/code>, <code>firstprivate<\/code>, <code>shared<\/code>, and <code>default<\/code> clauses specify the scope of variables used in a task, following a similar pattern to the same clauses on the <code>parallel<\/code> directive. Note that marking a pointer as <code>firstprivate<\/code> means that each task will have its own copy of the pinter. The underlying array is still shared across all tasks.<\/p>\n<p>The <code>final<\/code> clause prevents creating an excessive number of tasks by causing any further child tasks to be executed sequentially when the clause\u2019s condition is true. An <code>if<\/code> clause, conversely, causes the current <code>task<\/code> region to execute sequentially, but it can still create child tasks that execute in parallel. The <code>taskwait<\/code> directive allows for synchronization between tasks by waiting until a task\u2019s children complete before continuing.<\/p>\n<p>A few <code>task<\/code> clauses and an additional task-related directive are missing from this example. The <code>taskyield<\/code> directive allows the runtime to suspend a task\u2019s execution in order to execute other tasks, and is useful when a task may need to wait for some other work to complete. The <code>mergeable<\/code> and <code>untied<\/code> clauses on the <code>task<\/code> directive are optimization hints. An <code>untied<\/code> task that yields may resume on any thread, instead of only resuming on the thread that spawned the task. A <code>mergeable<\/code> task allows the runtime to reuse the data environment of its parent for the child task.<\/p>\n<p>Now, let&#8217;s take a look at an example that shows the usefulness of <code>task<\/code> in a scenario where <code>parallel for<\/code> is insufficient. For this example, we&#8217;ll be using a data structure designed for storing words called a Trie tree. In a Trie tree, each word is stored as a path through the tree, terminating in a node marked with <code>sequenceEnd == true<\/code>. A node with <code>sequenceEnd == true<\/code> may still have children if it is a substring of another word. For example, a Trie tree containing the words &#8220;car&#8221;, &#8220;care&#8221;, and &#8220;cars&#8221; would start with a <code>c<\/code> node, which has an <code>a<\/code> as a child, which in turn has an <code>r<\/code> as a child. The <code>r<\/code> node would be marked as an end node and also have two children, an <code>e<\/code> leaf and an <code>s<\/code> leaf, both also marked as terminating nodes, like so:<\/p>\n<pre><code class=\"language-text\">c\r\n \r\n  a\r\n   \r\n    r*\r\n   \/ \r\n  e*  s*   <\/code><\/pre>\n<p>A <code>parallel for<\/code> could not traverse a tree like this because there is no random access iterator, but a tree traversal can still take advantage of parallelism by creating a task for each node visited. Consider the following code for counting and printing out all of the words in a trie tree:<\/p>\n<pre><code class=\"language-cpp\">struct TrieNode {\r\n   TrieNode* parent;\r\n   std::unordered_map&lt;char, TrieNode*&gt; children;\r\n\r\n   bool sequenceEnd;\r\n   char currentChar;\r\n\r\n   void Print();\r\n   void PrintAllWordsWithSubstring();\r\n\r\n   ~TrieNode();\r\n};\r\n\r\nvoid TrieNode::Print() {\r\n  if (parent) {\r\n     parent-&gt;Print();\r\n     std::cout &lt;&lt; currentChar;\r\n  }\r\n}\r\n\r\nint totalWords;\r\n\r\nvoid TrieNode::PrintAllWordsWithSubstring() {\r\n   #pragma omp task\r\n   {\r\n      for (auto child : this-&gt;children) {\r\n         child.second-&gt;PrintAllWordsWithSubstring();\r\n      }\r\n\r\n      if (this-&gt;sequenceEnd) {\r\n          #pragma omp critical\r\n          {\r\n             this-&gt;Print();\r\n             std::cout &lt;&lt; std::endl;\r\n          }\r\n\r\n          #pragma omp atomic\r\n             ++totalWords;\r\n      }\r\n   }\r\n}\r\n\r\nvoid PrintAllWords(TrieNode* root) {\r\n   totalWords = 0;\r\n\r\n   #pragma omp parallel\r\n   {\r\n      #pragma omp single\r\n      root-&gt;PrintAllWordsWithSubstring();\r\n   }\r\n}<\/code><\/pre>\n<p>In this example, the <code>PrintAllWordsWithSubstring<\/code> member function recursively creates a new task for each node. The <code>this<\/code> pointer is implicitly <code>firstprivate<\/code> inside the <code>task<\/code> region. Choosing the correct data-sharing attributes for variables is especially important for <code>task<\/code> because a task&#8217;s execution is deferred instead of executed immediately, as a <code>parallel<\/code> region is. As a result, the function that creates the task may return before the <code>task<\/code> region is executed and any <code>shared<\/code> variables on the stack may fall out of scope. If the <code>this<\/code> pointer was shared, a task in a member function couldn&#8217;t reliably access member variables. To see the difference more clearly, consider the following broken alternative implementation of <code>TrieNode::PrintAllWordsWithSubstring<\/code>:<\/p>\n<pre><code class=\"language-cpp\">void TrieNode::PrintAllWordsWithSubstring() {\r\n   for (auto child : this-&gt;children) {\r\n      #pragma omp task shared(child)\r\n      {\r\n         \/\/ This line will cause an Access Violation.\r\n         child.second-&gt;PrintAllWordsWithSubstring();\r\n      }\r\n   }\r\n\r\n   if (this-&gt;sequenceEnd) {\r\n      this-&gt;Print();\r\n      std::cout &lt;&lt; std::endl;\r\n\r\n      #pragma omp atomic\r\n         ++totalWords;\r\n   }\r\n}<\/code><\/pre>\n<p>If a variable is only read and never written to inside a <code>parallel<\/code> region, marking it as <code>shared<\/code> doesn&#8217;t change the end result. No writes will occur during the execution of the <code>parallel<\/code> region, so all threads will see the same value. However, code outside of a <code>task<\/code> region may execute concurrently with the execution of a <code>task<\/code>. In the flawed implementation above, by the time the recursive call to <code>PrintAllWordsWithSubstring(child.second)<\/code> is made, the iterator will likely already have reached the end of <code>children<\/code> and <code>child.second<\/code> will no longer have a valid value.<\/p>\n<h2>Our OpenMP Plans<\/h2>\n<p>As of 17.2, all of the OpenMP 2.5 standard is supported, as well as tasks and parallel for loops with unsigned indices from the OpenMP 3.1 standard. We have started the long process to support newer versions of the OpenMP standard. Our ultimate goal is to support the most recent OpenMP standard by leveraging LLVM&#8217;s OpenMP runtime, but this will take time. Our next step for OpenMP will be to support the remaining features added in the OpenMP 3.1 standard. Which features are added first will depend on your feedback. We would love to hear from you which specific OpenMP 3.1 features you would like to see first.<\/p>\n<h2>Feedback<\/h2>\n<p>We encourage you to try out this update in the latest <a href=\"https:\/\/visualstudio.microsoft.com\/vs\/preview\/\">Visual Studio 2022 version 17.2 Preview<\/a>. If you encounter a correctness issue in code generated with the <strong>-openmp:llvm<\/strong> switch or bugs in the libomp140 DLLs shipped with Visual Studio, please let us know. We can be reached via the comments below, via twitter (<a href=\"https:\/\/twitter.com\/visualc\">@visualc<\/a>) or via <a href=\"https:\/\/developercommunity.visualstudio.com\/\">Developer Community<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The MSVC compiler adds support for OpenMP tasks with the \/openmp:llvm flag.<\/p>\n","protected":false},"author":18811,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,3921],"tags":[],"class_list":["post-30401","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus","category-openmp"],"acf":[],"blog_post_summary":"<p>The MSVC compiler adds support for OpenMP tasks with the \/openmp:llvm flag.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/30401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/18811"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=30401"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/30401\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=30401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=30401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=30401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}