{"id":20956,"date":"2018-11-06T05:31:55","date_gmt":"2018-11-06T05:31:55","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/vcblog\/?p=20956"},"modified":"2019-02-18T17:47:35","modified_gmt":"2019-02-18T17:47:35","slug":"exploring-clang-tooling-part-3-rewriting-code-with-clang-tidy","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/exploring-clang-tooling-part-3-rewriting-code-with-clang-tidy\/","title":{"rendered":"Exploring Clang Tooling Part 3: Rewriting Code with clang-tidy"},"content":{"rendered":"<p>In the <a href=\"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2018\/10\/23\/exploring-clang-tooling-part-2-examining-the-clang-ast-with-clang-query\">previous post<\/a> in this series, we used <tt>clang-query<\/tt> to examine the Abstract Syntax Tree of a simple source code file. Using <tt>clang-query<\/tt>, we can prototype an AST Matcher which we can use in a <tt>clang-tidy<\/tt> check to refactor code in bulk.<\/p>\n<p>This time, we will complete the rewriting of the source code.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/implement-FIXIT.png\"><img decoding=\"async\" class=\"alignnone wp-image-21235\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/implement-FIXIT.png\" alt=\"\" width=\"700\" \/><\/a><\/p>\n<p>Let&#8217;s return to MyFirstCheck.cpp we generated <a href=\"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2018\/10\/19\/exploring-clang-tooling-part-1-extending-clang-tidy\/\">earlier<\/a> and update the <tt>registerMatchers<\/tt> method. First we can refactor it to port both function declarations and function calls, using the <tt>callExpr()<\/tt> and <tt>callee()<\/tt> matchers we used in the previous post:<\/p>\n<pre class=\"\">void MyFirstCheckCheck::registerMatchers(MatchFinder *Finder) {\r\n    \r\n  auto nonAwesomeFunction = functionDecl(\r\n    unless(matchesName(\"^::awesome_\"))\r\n    );\r\n\r\n  Finder-&gt;addMatcher(\r\n    nonAwesomeFunction.bind(\"addAwesomePrefix\")\r\n    , this);\r\n\r\n  Finder-&gt;addMatcher(\r\n    callExpr(callee(nonAwesomeFunction)).bind(\"addAwesomePrefix\")\r\n    , this);\r\n}<\/pre>\n<p>Because Matchers are really C++ code, we can extract them into variables and compose them into multiple other Matchers, as done here with <tt>nonAwesomeFunction<\/tt>.<\/p>\n<p>In this case, I have narrowed the declaration matcher to match only on function declarations which do not start with <tt>awesome_<\/tt>. That matcher is then used once with a binder <tt>addAwesomePrefix<\/tt>, then again to specify the <tt>callee()<\/tt> of a <tt>callExpr()<\/tt>, again binding the relevant expression to the name <tt>addAwesomePrefix<\/tt>.<\/p>\n<p>Because large scale refactoring often involves primarily changing particular expressions, it generally makes sense to separately define the matchers for the declaration to match and the expressions referencing those declarations. In my experience, the matchers for declarations can get complicated for example with exclusions due to limitations of a reflection system, or with more specifics about functions with particular return types or argument types. Centralizing those cases helps keep your refactoring code maintainable.<\/p>\n<p>Another change I have made is that I renamed the binding from <tt>x<\/tt> to <tt>addAwesomePrefix<\/tt>. This is notable because it uses verbs to describe what should be done with the matches. It should be clear from reading matcher bindings what the result of invoking the fix is to be. Binding names can then be seen as a weakly-typed string-based language interface between the matcher and the replacement code.\nWe can then implement <tt>MyFirstCheckCheck::check<\/tt> to consume the bindings. A first approximation might look like:<\/p>\n<pre class=\"\">void MyFirstCheckCheck::check(const MatchFinder::MatchResult &amp;Result) {\r\n  if (const auto MatchedDecl = Result.Nodes.getNodeAs&lt;FunctionDecl&gt;(\"addAwesomePrefix\"))\r\n  {\r\n    diag(MatchedDecl-&gt;getLocation(), \"function is insufficiently awesome\")\r\n      &lt;&lt; FixItHint::CreateInsertion(MatchedDecl-&gt;getLocation(), \"awesome_\");\r\n  }\r\n\r\n  if (const auto MatchedExpr = Result.Nodes.getNodeAs&lt;CallExpr&gt;(\"addAwesomePrefix\"))\r\n  {\r\n    diag(MatchedExpr-&gt;getExprLoc(), \"code is insufficiently awesome\")\r\n      &lt;&lt; FixItHint::CreateInsertion(MatchedExpr-&gt;getExprLoc(), \"awesome_\");\r\n  }\r\n}<\/pre>\n<p>Perhaps a better implementation would reduce the duplication of the diagnostic code:<\/p>\n<pre class=\"\">void MyFirstCheckCheck::check(const MatchFinder::MatchResult &amp;Result) {\r\n  SourceLocation insertionLocation;\r\n  if (const auto MatchedDecl = Result.Nodes.getNodeAs&lt;FunctionDecl&gt;(\"addAwesomePrefix\"))\r\n  {\r\n    insertionLocation = MatchedDecl-&gt;getLocation();\r\n  } else if (const auto MatchedExpr = Result.Nodes.getNodeAs&lt;CallExpr&gt;(\"addAwesomePrefix\"))\r\n  {\r\n    insertionLocation = MatchedExpr-&gt;getExprLoc();\r\n  }\r\n  diag(insertionLocation, \"code is insufficiently awesome\")\r\n      &lt;&lt; FixItHint::CreateInsertion(insertionLocation, \"awesome_\");\r\n}<\/pre>\n<p>Because the <tt>FunctionDecl<\/tt> and the <tt>CallExpr<\/tt> do not share an inheritance hierarchy, we need separate casting conditions for each. Even if they did share an inheritance hierarchy, we need to call <tt>getLocation<\/tt> in one case, and <tt>getExprLoc<\/tt> in another. The reason for that is that Clang records many relevant locations for each AST node. The developer of the clang-tidy check needs to know which location accessor method is appropriate or required for each situation.\nA further improvement is to change the casts to accept the relevant types of <tt>FunctionDecl<\/tt> and <tt>CallExpr<\/tt> &#8211; <tt>NamedDecl<\/tt> and <tt>Expr<\/tt> respectively.<\/p>\n<pre class=\"\">if (const auto MatchedDecl = Result.Nodes.getNodeAs&lt;NamedDecl&gt;(\"addAwesomePrefix\"))\r\n{\r\n  insertionLocation = MatchedDecl-&gt;getLocation();\r\n} else if (const auto MatchedExpr = Result.Nodes.getNodeAs&lt;Expr&gt;(\"addAwesomePrefix\"))\r\n{\r\n  insertionLocation = MatchedExpr-&gt;getExprLoc();\r\n}<\/pre>\n<p>This change enforces the idea that the names of bound nodes form a weakly-typed interface between the Matcher code and the Rewriter code. Because the Rewriter code now expects the <tt>addAwesomePrefix<\/tt> to be used with the base types <tt>NamedDecl<\/tt> and <tt>Expr<\/tt>, other Matcher code can take advantage of that. We can now re-use the <tt>addAwesomePrefix<\/tt> binding name to add a prefix to field declarations or member expressions for example because their corresponding Clang AST classes also inherit <tt>NamedDecl<\/tt>:<\/p>\n<pre class=\"\">auto nonAwesomeField = fieldDecl(unless(hasName(\"::awesome_\")));\r\nFinder-&gt;addMatcher(\r\n  nonAwesomeField.bind(\"addAwesomePrefix\")\r\n  , this);\r\n\r\nFinder-&gt;addMatcher(\r\n  memberExpr(member(nonAwesomeField)).bind(\"addAwesomePrefix\")\r\n  , this);<\/pre>\n<p>Notice that this code is comparable to the matchers we wrote for the <tt>functionDecl<\/tt>\/<tt>callExpr<\/tt> pairing. Taking advantage of the binding name interface, we can continue extending our matcher code to port variable declarations without changing the rewriter side of that interface:<\/p>\n<pre class=\"\">void MyFirstCheckCheck::registerMatchers(MatchFinder *Finder) {\r\n  \r\n  auto nonAwesome = namedDecl(\r\n    unless(matchesName(\"::awesome_.*\"))\r\n    );\r\n\r\n  auto nonAwesomeFunction = functionDecl(nonAwesome);\r\n  \/\/ void foo(); \r\n  Finder-&gt;addMatcher(\r\n    nonAwesomeFunction.bind(\"addAwesomePrefix\")\r\n    , this);\r\n\r\n  \/\/ foo();\r\n  Finder-&gt;addMatcher(\r\n    callExpr(callee(nonAwesomeFunction)).bind(\"addAwesomePrefix\")\r\n    , this);\r\n\r\n  auto nonAwesomeVar = varDecl(nonAwesome);\r\n  \/\/ int foo;\r\n  Finder-&gt;addMatcher(\r\n    nonAwesomeVar.bind(\"addAwesomePrefix\")\r\n    , this);\r\n\r\n  \/\/ foo = 7;\r\n  Finder-&gt;addMatcher(\r\n    declRefExpr(to(nonAwesomeVar)).bind(\"addAwesomePrefix\")\r\n    , this);\r\n\r\n  auto nonAwesomeField = fieldDecl(nonAwesome);\r\n  \/\/ int m_foo;\r\n  Finder-&gt;addMatcher(\r\n    nonAwesomeField.bind(\"addAwesomePrefix\")\r\n    , this);\r\n\r\n  \/\/ m_foo = 42;\r\n  Finder-&gt;addMatcher(\r\n    memberExpr(member(nonAwesomeField)).bind(\"addAwesomePrefix\")\r\n    , this);\r\n}<\/pre>\n<h2>Location Location Location<\/h2>\n<p>Let&#8217;s return to the <tt>check<\/tt> implementation and examine it. This method is responsible for implementing the rewriting of the source code as described by the matchers and their bound nodes.\nIn this case, we have inserted code at the <tt>SourceLocation<\/tt> returned by either <tt>getLocation()<\/tt> or <tt>getExprLoc()<\/tt> of <tt>NamedDecl<\/tt> or <tt>Expr<\/tt> respectively. Clang AST classes have many methods returning <tt>SourceLocation<\/tt> which refer to various places in the source code related to particular AST nodes.\nFor example, the <a href=\"https:\/\/clang.llvm.org\/doxygen\/classclang_1_1CallExpr.html\"><tt>CallExpr<\/tt><\/a> has <tt>SourceLocation<\/tt> accessors <tt>getBeginLoc<\/tt>, <tt>getEndLoc<\/tt> and <tt>getExprLoc<\/tt>. It is currently difficult to discover how a particular position in the source code relates to a particular <tt>SourceLocation<\/tt> accessor.<\/p>\n<p><tt>clang::VarDecl<\/tt> represents variable declarations in the Clang AST. <tt>clang::ParmVarDecl<\/tt> inherits <tt>clang::VarDecl<\/tt> and represents parameter declarations. Notice that in all cases, <tt>end<\/tt> locations indicate the beginning of the last token, not the end of it. Note also that in the second example below, the source locations of the call used to initialize the variable are not part of the variable. It is necessary to traverse to the initialization expression to access those.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/varDecl_loc.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-21305\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/varDecl_loc.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p><tt>clang::FunctionDecl<\/tt> represents function declarations in the Clang AST. <tt>clang::CXXMethodDel<\/tt> inherits <tt>clang::FunctionDecl<\/tt> and represents method declarations. Note that the location of the return type is not always given by <tt>getBeginLoc<\/tt> in C++.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/functionDecl_loc.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-21295\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/functionDecl_loc.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p><tt>clang::CallExpr<\/tt> represents function calls in the Clang AST. <tt>clang::CXXMemberCallExpr<\/tt> inherits <tt>clang::CallExpr<\/tt> and represents method calls. Note that when calling free functions (represented by a <tt>clang::CallExpr<\/tt>), the <tt>getExprLoc<\/tt> and the <tt>getBeginLoc<\/tt> will be the same. Always chose the semantically correct location accessor, rather than a location which appears to indicate the correct position.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/cxxMemberCallExpr_loc.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-21285\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/cxxMemberCallExpr_loc.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>It is important to know that locations on AST classes point to the start of tokens in all cases. This can be initially confusing when examining end locations. Sometimes to get to a desired location, it is necessary to use <tt>getLocWithOffset()<\/tt> to advance or retreat a <tt>SourceLocation<\/tt>. Advancing to the end of a token can be achieved with <tt>Lexer::getLocForEndOfToken<\/tt>.<\/p>\n<p>The source code locations of arguments to the function call are not accessible from the <tt>CallExpr<\/tt>, but must be accessed via AST nodes for the arguments themselves.<\/p>\n<pre class=\"\">\/\/ Get the zeroth argument:\r\nExpr* arg0 = someCallExpr-&gt;getArg(0);\r\nSourceLocation arg0Loc = arg0-&gt;getExprLoc();<\/pre>\n<p>Every AST node has accessors <tt>getBeginLoc<\/tt> and <tt>getEndLoc<\/tt>. Expression nodes additionally have a <tt>getExprLoc<\/tt>, and declaration nodes have an additional <tt>getLocation<\/tt> accessor. More-specific subclasses have more-specific accessors for locations relevant to the C++ construct they represent. Source code locations in Clang are comprehensive, but accessing them can get complex as requirements become more advanced. A future blog post may explore this topic in more detail if there is interest among the readership.<\/p>\n<p>Once we have acquired the locations we are interested in, we need to insert, remove or replace source code fragments at those locations.<\/p>\n<p>Let&#8217;s return to MyFirstCheck.cpp:<\/p>\n<pre class=\"\">diag(insertionLocation, \"code is insufficiently awesome\")\r\n    &lt;&lt; FixItHint::CreateInsertion(insertionLocation, \"awesome_\");<\/pre>\n<p><tt>diag<\/tt> is a method on the <a href=\"https:\/\/code.woboq.org\/llvm\/clang-tools-extra\/clang-tidy\/ClangTidy.cpp.html#_ZN5clang4tidy14ClangTidyCheck4diagENS_14SourceLocationEN4llvm9StringRefENS_13DiagnosticIDs5LevelE\"><tt>ClangTidyCheck<\/tt> base class<\/a>. The purpose of it is to issue diagnostics and messages to the user. It can be called with just a source location and a message, causing a diagnostic to be emitted at the specified location:<\/p>\n<pre class=\"\">diag(insertionLocation, \"code is insufficiently awesome\");<\/pre>\n<p>Resulting in:<\/p>\n<pre>    testfile.cpp:19:5: warning: code is insufficiently awesome [misc-my-first-check]\r\n    int addTwo(int num)\r\n        ^\r\n<\/pre>\n<p>The <tt>diag<\/tt> method returns a <tt>DiagnosticsBuilder<\/tt> to which we can stream fix suggestions using <a href=\"https:\/\/clang.llvm.org\/doxygen\/classclang_1_1FixItHint.html\"><tt>FixItHint<\/tt><\/a>.<\/p>\n<p>The <tt>CreateRemoval<\/tt> method creates a <tt>FixIt<\/tt> for removal of a range of source code. At its heart, a <tt>SourceRange<\/tt> is just a pair of <tt>SourceLocation<\/tt>s. If we wanted to remove the <tt>awesome_<\/tt> prefix from functions which have it, we might expect to write something like this:<\/p>\n<pre class=\"\">void MyFirstCheckCheck::registerMatchers(MatchFinder *Finder) {\r\n  \r\n  Finder-&gt;addMatcher(\r\n    functionDecl(\r\n      matchesName(\"::awesome_.*\")\r\n      ).bind(\"removeAwesomePrefix\")\r\n    , this);\r\n}\r\n\r\nvoid MyFirstCheckCheck::check(const MatchFinder::MatchResult &amp;Result) {\r\n\r\n  if (const auto MatchedDecl = Result.Nodes.getNodeAs&lt;NamedDecl&gt;(\"removeAwesomePrefix\"))\r\n  {\r\n      auto removalStartLocation = MatchedDecl-&gt;getLocation();\r\n      auto removalEndLocation = removalStartLocation.getLocWithOffset(sizeof(\"awesome_\") - 1);\r\n      auto removalRange = SourceRange(removalStartLocation, removalEndLocation);\r\n\r\n      diag(removalStartLocation, \"code is too awesome\")\r\n          &lt;&lt; FixItHint::CreateRemoval(removalRange);\r\n  }\r\n}<\/pre>\n<p>The matcher part of this code is fine, but when we run clang-tidy, we find that the removal is applied to the entire function name, not only the <tt>awesome_<\/tt> prefix. The problem is that Clang extends the end of the removal range to the end of the token pointed to by the end. This is symmetric with the fact that AST nodes have <tt>getEndLoc()<\/tt> methods which point to the start of the last token. Usually, the intent is to remove or replace entire tokens.<\/p>\n<p>To make a replacement or removal in source code which extends into the middle of a token, we need to indicate that we are replacing a range of characters instead of a range of tokens, using <tt>CharSourceRange::getCharRange<\/tt>:<\/p>\n<pre class=\"\">auto removalRange = CharSourceRange::getCharRange(removalStartLocation, removalEndLocation);\r\n<\/pre>\n<h2>Conclusion<\/h2>\n<p>This concludes the mini-series about writing <tt>clang-tidy<\/tt> checks. This series has been an experiment to gauge interest, and there is a lot more content to cover in further posts if there is interest among the readership.<\/p>\n<p>Further topics can cover topics that occur in the real world such as<\/p>\n<ul>\n<li>Creation of compile databases<\/li>\n<li>Creating a stand-alone buildsystem for clang-tidy checks<\/li>\n<li>Understanding and exploring source locations<\/li>\n<li>Completing more-complex tasks<\/li>\n<li>Extending the matcher system with custom matchers<\/li>\n<li>Testing refactorings<\/li>\n<li>More tips and tricks from the trenches.<\/li>\n<\/ul>\n<p>This would cover everything you need to know in order to quickly and effectively create and use custom refactoring tools on your codebase.<\/p>\n<p>Do you want to see more! Let us know in the comments below or contact the author directly via e-mail at <a href=\"mailto:stkelly@microsoft.com\">stkelly@microsoft.com<\/a>, or on Twitter <a href=\"https:\/\/twitter.com\/steveire\">@steveire<\/a>.<\/p>\n<p>I will be showing even more new and future developments in <tt>clang-query<\/tt> and <tt>clang-tidy<\/tt> at <a href=\"http:\/\/codedive.pl\/index\/speaker\/name\/stephen-kelly\">code::dive<\/a> tomorrow, including many of the items listed as future topics above. Make sure to schedule it in your calendar if you are attending code::dive!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous post in this series, we used clang-query to examine the Abstract Syntax Tree of a simple source code file. Using clang-query, we can prototype an AST Matcher which we can use in a clang-tidy check to refactor code in bulk. This time, we will complete the rewriting of the source code. Let&#8217;s [&hellip;]<\/p>\n","protected":false},"author":890,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[563,512],"tags":[],"class_list":["post-20956","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-clang","category-general-cpp-series"],"acf":[],"blog_post_summary":"<p>In the previous post in this series, we used clang-query to examine the Abstract Syntax Tree of a simple source code file. Using clang-query, we can prototype an AST Matcher which we can use in a clang-tidy check to refactor code in bulk. This time, we will complete the rewriting of the source code. Let&#8217;s [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/20956","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/890"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=20956"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/20956\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=20956"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=20956"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=20956"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}