{"id":20953,"date":"2018-10-23T03:56:07","date_gmt":"2018-10-23T03:56:07","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/vcblog\/?p=20953"},"modified":"2019-02-18T17:47:36","modified_gmt":"2019-02-18T17:47:36","slug":"exploring-clang-tooling-part-2-examining-the-clang-ast-with-clang-query","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/exploring-clang-tooling-part-2-examining-the-clang-ast-with-clang-query\/","title":{"rendered":"Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query"},"content":{"rendered":"<p><em>This post is part of a regular series of posts where the C++ product team and other guests answer questions we have received from customers. The questions can be about anything C++ related: MSVC toolset, the standard language and library, the C++ standards committee, isocpp.org, CppCon, etc.<\/em><\/p>\n<p><em>Today\u2019s post is by guest author Stephen Kelly, who is a developer at Havok, a contributor to Qt and CMake and <a href=\"https:\/\/steveire.wordpress.com\/\">a blogger<\/a>. This post is part of a series where he is sharing his experience using Clang tooling in his current team.<\/em><\/p>\n<p>In the <a href=\"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2018\/10\/19\/exploring-clang-tooling-part-1-extending-clang-tidy\/\">last post<\/a>, we created a new <tt>clang-tidy<\/tt> check following documented steps and encountered the first limitation in our own knowledge &#8211; how can we change both declarations and expressions such as function calls?<\/p>\n<p>In order to create an effective refactoring tool, we need to understand the code generated by the <tt>create_new_check.py<\/tt> script and learn how to extend it.<\/p>\n<h2>Exploring C++ Code as C++ Code<\/h2>\n<p>When Clang processes C++, it creates an <a href=\"http:\/\/clang.llvm.org\/docs\/IntroductionToTheClangAST.html\">Abstract Syntax Tree<\/a> representing the code. The AST needs to be able to represent all of the possible complexity that can appear in C++ code &#8211; variadic templates, lambdas, operator overloading, declarations of various kinds etc. If we can use the AST representation of the code in our tooling, we won&#8217;t be discarding any of the meaning of the code in the process, as we would if we limit ourselves to processing only text.<\/p>\n<p>Our goal is to harness the complexity of the AST so that we can describe patterns in it, and then replace those patterns with new text. The Clang <a href=\"https:\/\/clang.llvm.org\/docs\/LibASTMatchers.html\">AST Matcher API<\/a> and <a href=\"https:\/\/clang.llvm.org\/doxygen\/classclang_1_1FixItHint.html\">FixIt API<\/a> satisfy those requirements respectively.<\/p>\n<p>The level of complexity in the AST means that detailed knowledge is required in order to comprehend it. Even for an experienced C++ developer, the number of classes and how they relate to each other can be daunting. Luckily, there is a rhythm to it all. We can identify patterns, use tools to discover what makes up the Clang model of the C++ code, and get to the point of having an instinct about how to create a <tt>clang-tidy<\/tt> check quickly.<\/p>\n<h2>Exploring a Clang AST<\/h2>\n<p>Let&#8217;s dive in and create a simple piece of test code so we can examine the Clang AST for it:<\/p>\n<pre class=\"\u201clang:cpp\" decode=\"true\u201d\"> \r\nint addTwo(int num) \r\n{ \r\n    return num + 2; \r\n} \r\n\r\nint main(int, char**) \r\n{ \r\n    return addTwo(3); \r\n} \r\n<\/pre>\n<p>There are multiple ways to examine the Clang AST, but the most useful when creating AST Matcher based refactoring tools is <tt>clang-query<\/tt>. We need to build up our knowledge of AST matchers and the AST itself at the same time via <tt>clang-query<\/tt>.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/examine-and-prototype.png\"><img decoding=\"async\" class=\"alignnone wp-image-21235\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/9\/2019\/02\/examine-and-prototype.png\" alt=\"\" width=\"700\" \/><\/a><\/p>\n<p>So, let&#8217;s return to MyFirstCheck.cpp which we created in the last post. The <tt> MyFirstCheckCheck::registerMatchers<\/tt> method contains the following line:<\/p>\n<pre class=\"\u201clang:cpp\" decode=\"true\u201d\"> \r\nFinder-&gt;addMatcher(functionDecl().bind(\"x\"), this); \r\n<\/pre>\n<p>The first argument to <tt>addMatcher<\/tt> is an AST matcher, an Embedded Domain Specific Language of sorts. This is a predicate language which <tt>clang-tidy<\/tt> uses to traverses the AST and create a set of resulting &#8216;bound nodes&#8217;. In the above case, a bound node with the name <tt>x<\/tt> is created for each function declaration in the AST. <tt>clang-tidy<\/tt> later calls <tt>MyFirstCheckCheck::check<\/tt> for each set of bound nodes in the result.<\/p>\n<p>Let&#8217;s start <tt>clang-query<\/tt> passing our test file as a parameter and following it with two dashes. Similar to use of <tt>clang-tidy<\/tt> in <a href=\"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2018\/10\/19\/exploring-clang-tooling-part-1-extending-clang-tidy\/\">Part 1<\/a>, this allows us to specify compile options and avoid warnings about a missing compilation database.<\/p>\n<p>This command drops us into an interactive interpreter which we can use to query the AST:<\/p>\n<pre>$ clang-query.exe testfile.cpp -- \r\n\r\nclang-query&gt;\r\n<\/pre>\n<p>Type <tt>help<\/tt> for a full set of commands available in the interpreter. The first command we can examine is <tt>match<\/tt>, which we can abbreviate to <tt>m<\/tt>. Let&#8217;s paste in the matcher from <tt>MyFirstCheck.cpp<\/tt>:<\/p>\n<pre>clang-query&gt; match functionDecl().bind(\"x\") \r\n\r\nMatch #1: \r\n \r\ntestfile.cpp:1:1: note: \"root\" binds here \r\nint addTwo(int num) \r\n^~~~~~~~~~~~~~~~~~~ \r\ntestfile.cpp:1:1: note: \"x\" binds here \r\nint addTwo(int num) \r\n^~~~~~~~~~~~~~~~~~~ \r\n \r\nMatch #2: \r\n \r\ntestfile.cpp:6:1: note: \"root\" binds here \r\nint main(int, char**) \r\n^~~~~~~~~~~~~~~~~~~~~ \r\ntestfile.cpp:6:1: note: \"x\" binds here \r\nint main(int, char**) \r\n^~~~~~~~~~~~~~~~~~~~~ \r\n2 matches. \r\n<\/pre>\n<p><tt>clang-query<\/tt> automatically creates a binding for the root element in a matcher. This gets noisy when trying to match something specific, so it makes sense to turn that off if defining custom binding names:<\/p>\n<pre>clang-query&gt; set bind-root false \r\nclang-query&gt; m functionDecl().bind(\"x\") \r\n\r\nMatch #1: \r\n\r\ntestfile.cpp:1:1: note: \"x\" binds here \r\nint addtwo(int num) \r\n^~~~~~~~~~~~~~~~~~~ \r\n\r\nMatch #2: \r\n\r\ntestfile.cpp:6:1: note: \"x\" binds here \r\nint main(int, char**) \r\n^~~~~~~~~~~~~~~~~~~~~ \r\n2 matches. \r\n<\/pre>\n<p>So, we can see that for each function declaration that appeared in the translation unit, we get a resulting match. <tt>clang-tidy<\/tt> will later use these matches one at a time in the <tt>check<\/tt> method in <tt>MyFirstCheck.cpp<\/tt> to complete the refactoring.<\/p>\n<p>Use <tt>quit<\/tt> to exit the <tt>clang-query<\/tt> interpreter. The interpreter must be restarted each time C++ code is changed in order for the new content to be matched.<\/p>\n<h2>Nesting matchers<\/h2>\n<p>The AST Matchers form a &#8216;predicate language&#8217; where each matcher in the vocabulary is itself a predicate, and those predicates can be nested. The matchers fit into three broad categories as documented in the <a href=\"http:\/\/clang.llvm.org\/docs\/LibASTMatchersReference.html\">AST Matchers Reference<\/a>.<\/p>\n<p><tt>functionDecl()<\/tt> is an AST Matcher which is invoked for each function declaration in the source code. In normal source code, there will be hundreds or thousands of results coming from external headers for such a simple matcher.<\/p>\n<p>Let&#8217;s match only functions with a particular name:<\/p>\n<pre>clang-query&gt; m functionDecl(hasName(\"addTwo\")) \r\n\r\nMatch #1: \r\n\r\ntestfile.cpp:1:1: note: \"root\" binds here \r\nint addTwo(int num) \r\n^~~~~~~~~~~~~~~~~~~ \r\n1 match. \r\n<\/pre>\n<p>This matcher will only trigger on function declarations which have the name &#8220;<tt>addTwo<\/tt>&#8220;. The middle column of the documentation indicates the name of each matcher, and the first column indicates the kind of matcher that it can be nested inside. The <tt>hasName<\/tt> documentation is not listed as being usable with the <tt>Matcher&lt;FunctionDecl&gt;<\/tt>, but instead with <tt>Matcher&lt;NamedDecl&gt;<\/tt>.<\/p>\n<p>Here, a developer without prior experience with the Clang AST needs to learn that the <a href=\"https:\/\/clang.llvm.org\/doxygen\/classclang_1_1FunctionDecl.html\"><tt>FunctionDecl<\/tt> AST class<\/a> inherits from the <tt>NamedDecl<\/tt> AST class (as well as <tt>DeclaratorDecl<\/tt>, <tt>ValueDecl<\/tt> and <tt>Decl<\/tt>). Matchers documented as usable with each of those classes can also work with a <tt>functionDecl()<\/tt> matcher. That familiarity with the inheritance structure of Clang AST classes is essential to proficiency with AST Matchers. The names of classes in the Clang AST correspond to &#8220;node matcher&#8221; names by making the first letter lower-case. In the case of class names with an abbreviation prefix <tt>CXX<\/tt> such as <tt>CXXMemberCallExpr<\/tt>, the entire prefix is lowercased to produce the matcher name <tt>cxxMemberCallExpr<\/tt>.<\/p>\n<p>So, instead of matching function declarations, we can match on all named declarations in our source code. Ignoring some noise in the output, we get results for each function declaration and each parameter variable declaration:<\/p>\n<pre>clang-query&gt; m namedDecl() \r\n... \r\nMatch #8: \r\n\r\ntestfile.cpp:1:1: note: \"root\" binds here \r\nint addTwo(int num) \r\n^~~~~~~~~~~~~~~~~~~ \r\n\r\nMatch #9: \r\n\r\ntestfile.cpp:1:12: note: \"root\" binds here \r\nint addTwo(int num) \r\n           ^~~~~~~ \r\n\r\nMatch #10: \r\n\r\ntestfile.cpp:6:1: note: \"root\" binds here \r\nint main(int, char**) \r\n^~~~~~~~~~~~~~~~~~~~~ \r\n\r\nMatch #11: \r\n\r\ntestfile.cpp:6:10: note: \"root\" binds here \r\nint main(int, char**) \r\n         ^~~ \r\n\r\nMatch #12: \r\n\r\ntestfile.cpp:6:15: note: \"root\" binds here \r\nint main(int, char**) \r\n              ^~~~~~\r\n<\/pre>\n<p>Parameter declarations are in the match results because they are represented by the <a href=\"https:\/\/clang.llvm.org\/doxygen\/classclang_1_1ParmVarDecl.html\"><tt>ParmVarDecl<\/tt> class<\/a>, which also inherits <tt>NamedDecl<\/tt>. We can match only parameter variable declarations by using the corresponding AST node matcher:<\/p>\n<pre>clang-query&gt; m parmVarDecl() \r\n\r\nMatch #1: \r\n\r\ntestfile.cpp:1:12: note: \"root\" binds here \r\nint addTwo(int num) \r\n           ^~~~~~~ \r\n\r\nMatch #2: \r\n\r\ntestfile.cpp:6:10: note: \"root\" binds here \r\nint main(int, char**) \r\n         ^~~ \r\n\r\nMatch #3: \r\n\r\ntestfile.cpp:6:15: note: \"root\" binds here \r\nint main(int, char**) \r\n              ^~~~~~\r\n<\/pre>\n<p><tt>clang-query<\/tt> has a code-completion feature, triggered by pressing TAB, which shows the matchers which can be used at any particular context. This feature is not enabled on Windows however.<\/p>\n<h2>Discovery Through Clang AST Dumps<\/h2>\n<p><tt>clang-query<\/tt> gets most useful as a discovery tool when exploring deeper into the AST and dumping intermediate nodes.<\/p>\n<p>Let&#8217;s query our <tt>testfile.cpp<\/tt> again, this time with the <tt>output<\/tt> set to <tt>dump<\/tt>:<\/p>\n<pre>clang-query&gt; set output dump \r\nclang-query&gt; m functionDecl(hasName(\u201caddTwo\u201d)) \r\n\r\nMatch #1: \r\n\r\nBinding for \"root\": \r\nFunctionDecl 0x17a193726b8 &lt;testfile.cpp:1:1, line:4:1&gt; line:1:5 used addTwo 'int (int)' \r\n|-ParmVarDecl 0x17a193725f0 &lt;col:12, col:16&gt; col:16 used num 'int' \r\n`-CompoundStmt 0x17a19372840 &lt;line:2:1, line:4:1&gt; \r\n  `-ReturnStmt 0x17a19372828 &lt;line:3:5, col:18&gt;\r\n      `-BinaryOperator 0x17a19372800 &lt;col:12, col:18&gt; 'int' '+' \r\n          |-ImplicitCastExpr 0x17a193727e8 &lt;col:12&gt; 'int' &lt;LValueToRValue&gt;\r\n            | `-DeclRefExpr 0x17a19372798 &lt;col:12&gt; 'int' lvalue ParmVar 0x17a193725f0 'num' 'int' \r\n            `-IntegerLiteral 0x17a193727c0 &lt;col:18&gt; 'int' 2\r\n<\/pre>\n<p>There is a lot here to take in, and a lot of noise which is not relevant to what we are interested in to make a matcher, such as pointer addresses, the word <tt>used<\/tt> appearing inexplicably and other content whose structure is not obvious. For the sake of brevity in this blog post, I will elide such content in further listings of AST content.<\/p>\n<p>The reported match has a <tt>FunctionDecl<\/tt> at the top level of a tree. Below that, we can see the <tt>ParmVarDecl<\/tt> nodes which we matched previously, and other nodes such as <tt>ReturnStmt<\/tt>. Each of these corresponds to a class name in the Clang AST, so it is useful to look them up to see what they inherit and know which matchers are relevant to their use.<\/p>\n<p>The AST also contains source location and source range information, the latter denoted by angle brackets. While this detailed output is useful for exploring the AST, it is not as useful for exploring the source code. Diagnostic mode can be re-entered with <tt>set output diag<\/tt> for source code exploration. Unfortunately, both outputs (<tt>dump<\/tt> and <tt>diag<\/tt>) can not currently be enabled at once, so it is necessary to switch between them.<\/p>\n<h2>Tree Traversal<\/h2>\n<p>We can traverse this tree using the <tt>has()<\/tt> matcher:<\/p>\n<pre>clang-query&gt; m functionDecl(has(compoundStmt(has(returnStmt(has(callExpr())))))) \r\n\r\nMatch #1: \r\n\r\nBinding for \"root\": \r\nFunctionDecl &lt;testfile.cpp:6:1, line:9:1&gt; line:6:5 main 'int (int, char **)' \r\n|-ParmVarDecl &lt;col:10&gt; col:13 'int' \r\n|-ParmVarDecl &lt;col:15, col:20&gt; col:21 'char **' \r\n`-CompoundStmt &lt;line:7:1, line:9:1&gt; \r\n  `-ReturnStmt &lt;line:8:5, col:20&gt; \r\n      `-CallExpr &lt;col:12, col:20&gt; 'int' \r\n          |-ImplicitCastExpr &lt;col:12&gt; 'int (*)(int)'\r\n            | `-DeclRefExpr &lt;col:12&gt; 'int (int)' 'addTwo'\r\n            `-IntegerLiteral &lt;col:19&gt; 'int' 3      \r\n<\/pre>\n<p>With some distracting content removed, we can see that the AST dump contains some source ranges and source locations. The ranges are denoted by angle brackets, which have a beginning and possibly an end position. To avoid repeating the filename and the keywords <tt>line<\/tt> and <tt>col<\/tt>, only difference from the previously printed source location are printed. For example, <tt>&lt;testfile.cpp:6:1, line:9:1&gt;<\/tt> describes a span from line 6 column 1 in <tt>testfile.cpp<\/tt> to line 9 column 1 also in <tt>testfile.cpp<\/tt>. The range <tt>&lt;col:15, col:20&gt;<\/tt> describes the span from column 15 to column 20 in line 6 (from a few lines above) in <tt>testfile.cpp<\/tt> as that is the last filename printed.<\/p>\n<p>Because each of the nested predicates match, the top-level <tt>functionDecl()<\/tt> matches and we get a binding for the result. We can additionally use a nested <tt>bind()<\/tt> call to add nodes to the result set:<\/p>\n<pre>clang-query&gt; m functionDecl(has(compoundStmt(has(returnStmt(has(callExpr().bind(\"functionCall\"))))))) \r\n\r\nMatch #1: \r\n\r\nBinding for \"functionCall\": \r\nCallExpr &lt;testfile.cpp:8:12, col:20&gt; 'int' \r\n|-ImplicitCastExpr &lt;col:12&gt; 'int (*)(int)'\r\n| `-DeclRefExpr &lt;col:12&gt; 'int (int)' 'addTwo'\r\n`-IntegerLiteral &lt;col:19&gt; 'int' 3 \r\n\r\nBinding for \"root\": \r\nFunctionDecl &lt;testfile.cpp:6:1, line:9:1&gt; line:6:5 main 'int (int, char **)' \r\n|-ParmVarDecl &lt;col:10&gt; col:13 'int' \r\n|-ParmVarDecl &lt;col:15, col:20&gt; col:21 'char **' \r\n`-CompoundStmt &lt;line:7:1, line:9:1&gt; \r\n  `-ReturnStmt &lt;line:8:5, col:20&gt; \r\n      `-CallExpr &lt;col:12, col:20&gt; 'int' \r\n          |-ImplicitCastExpr &lt;col:12&gt; 'int (*)(int)'\r\n            | `-DeclRefExpr &lt;col:12&gt; 'int (int)' 'addTwo'\r\n            `-IntegerLiteral &lt;col:19&gt; 'int' 3 \r\n<\/pre>\n<p>The <tt>hasDescendant()<\/tt> matcher can be used to match the same node as above in this case:<\/p>\n<pre>clang-query&gt; m functionDecl(hasDescendant(callExpr().bind(\"functionCall\")))\r\n<\/pre>\n<p>Note that over-use of the <tt>has()<\/tt> and <tt>hasDescendant()<\/tt> matchers &#8211; and their complements <tt>hasParent()<\/tt> and <tt>hasAncestor()<\/tt> &#8211; is usually an anti-pattern and can lead to unintended results, particularly while matching nested <tt>Expr<\/tt> subclasses in source code. Usually, higher-level matchers should be used instead. For example, while <tt>has()<\/tt> may be used to match a desired <tt>IntegerLiteral<\/tt> argument in the case above, it would not be possible to specify which argument we wish to match in a function which has multiple arguments. The <tt>hasArgument()<\/tt> matcher should be used in the case of <tt>callExpr()<\/tt> to resolve this issue, as it can specify which argument should be matched if there are multiple:<\/p>\n<pre>clang-query&gt; m callExpr(hasArgument(0, integerLiteral()))\r\n<\/pre>\n<p>The above matcher will match on every function call whose zeroth argument is an integer literal.<\/p>\n<p>Usually we want to use more narrowing criteria to only match on a particular category of matches. Most matchers accept multiple arguments and behave as though they have an implicit <tt>allOf()<\/tt> within them. So, we can write:<\/p>\n<pre>clang-query&gt; m callExpr(hasArgument(0, integerLiteral()), callee(functionDecl(hasName(\"addTwo\"))))\r\n<\/pre>\n<p>to match calls whose zeroth argument is an integer literal only if the function being called has the name &#8220;<tt>addTwo<\/tt>&#8220;.<\/p>\n<p>A matcher expression can sometimes be obvious to read and understand, but harder to write or discover. The particular node types which may be matched can be discovered by examining the output of <tt>clang-query<\/tt>. However, the <tt>callee()<\/tt> matcher here may be difficult to independently discover because it did not appear to be referenced in the AST dumps from <tt>clang-query<\/tt> and it is only one matcher in the long list in the reference documentation. The code of the existing <tt>clang-tidy<\/tt> checks are educational both to discover matchers which are commonly used together, and to find a context where particular matchers should be used.<\/p>\n<p>A nested matcher creating a binding in clang-query is another important discovery technique. If we have source code such as:<\/p>\n<pre class=\"\u201clang:cpp\" decode=\"true\u201d\"> \r\nint add(int num1, int num2) \r\n{\r\n  return num1 + num2; \r\n} \r\n\r\nint add(int num1, int num2, int num3) \r\n{\r\n  return num1 + num2 + num3; \r\n} \r\n\r\nint main(int argc, char**) \r\n{ \r\n  int i = 42; \r\n\r\n  return add(argc, add(42, i), 4 * 7); \r\n}\r\n<\/pre>\n<p>and we intend to introduce a <tt>safe_int<\/tt> type to use instead of <tt>int<\/tt> in the signature of <tt>add<\/tt>. All existing uses of <tt>add<\/tt> must be ported to some new pattern of code.<\/p>\n<p>The basic workflow with <tt>clang-query<\/tt> is that we must first identify source code which is exemplary of what we want to port and then determine how it is represented in the Clang AST. We will need to identify the locations of arguments to the <tt>add<\/tt> function and their AST types as a first step.<\/p>\n<p>Let&#8217;s start with <tt>callExpr()<\/tt> again:<\/p>\n<pre>clang-query&gt; m callExpr() \r\n\r\nMatch #1: \r\n\r\ntestfile.cpp:15:10: note: \"root\" binds here \r\n    return add(argc, add(42, i), 4 * 7); \r\n           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ \r\n\r\nMatch #2: \r\n\r\ntestfile.cpp:15:20: note: \"root\" binds here \r\n    return add(argc, add(42, i), 4 * 7); \r\n                     ^~~~~~~~~~ \r\n\r\n<\/pre>\n<p>This example uses various different arguments to the <tt>add<\/tt> function: the first argument is a parameter from a different function, then a return value of another call, then an inline multiplication. <tt>clang-query<\/tt> can help us discover how to match these constructs. Using the <tt>hasArgument()<\/tt> matcher we can bind to each of the three arguments, and using <tt>bind-root false<\/tt> for brevity:<\/p>\n<pre>clang-query&gt; set bind-root false \r\nclang-query&gt; m callExpr(hasArgument(0, expr().bind(\"a1\")), hasArgument(1, expr().bind(\"a2\")), hasArgument(2, expr().bind(\"a3\"))) \r\n\r\nMatch #1: \r\n\r\ntestfile.cpp:15:14: note: \"a1\" binds here \r\nreturn add(argc, add(42, i), 4 * 7); \r\n           ^~~~ \r\n\r\ntestfile.cpp:15:20: note: \"a2\" binds here \r\nreturn add(argc, add(42, i), 4 * 7); \r\n                 ^~~~~~~~~~ \r\n\r\ntestfile.cpp:15:32: note: \"a3\" binds here \r\nreturn add(argc, add(42, i), 4 * 7); \r\n                             ^~~~~\r\n<\/pre>\n<p>Changing the output to <tt>dump<\/tt> and re-running the same matcher:<\/p>\n<pre>clang-query&gt; set output dump \r\nclang-query&gt; m callExpr(hasArgument(0, expr().bind(\"a1\")), hasArgument(1, expr().bind(\"a2\")), hasArgument(2, expr().bind(\"a3\"))) \r\n\r\nMatch #1: \r\n\r\nBinding for \"a1\": \r\nDeclRefExpr &lt;testfile.cpp:15:14&gt; 'int' 'argc'\r\n\r\nBinding for \"a2\": \r\nCallExpr &lt;testfile.cpp:15:20, col:29&gt; 'int' \r\n|-ImplicitCastExpr &lt;col:20&gt; 'int (*)(int, int)' \r\n| `-DeclRefExpr &lt;col:20&gt; 'int (int, int)' 'add' \r\n|-IntegerLiteral &lt;col:24&gt; 'int' 42 \r\n`-ImplicitCastExpr &lt;col:28&gt; 'int' \r\n  `-DeclRefExpr &lt;col:28&gt; 'int' 'i' \r\n\r\nBinding for \"a3\": \r\nBinaryOperator &lt;testfile.cpp:15:32, col:36&gt; 'int' '*' \r\n|-IntegerLiteral &lt;col:32&gt; 'int' 4 \r\n`-IntegerLiteral &lt;col:36&gt; 'int' 7 \r\n<\/pre>\n<p>We can see that the top-level AST nodes of the arguments are <tt>DeclRefExpr<\/tt>, <tt>CallExpr<\/tt> and <tt>BinaryOperator<\/tt> respectively. When implementing our refactoring tool, we might want to wrap the <tt>argc<\/tt> as <tt>safe_int(argc)<\/tt>, ignore the nested <tt>add()<\/tt> call, as its return type will be changed to <tt>safe_int<\/tt>, and change the <tt>BinaryOperator<\/tt> to some safe operation.<\/p>\n<p>As we learn about the AST we are examining, we can also replace the <tt>expr()<\/tt> with something more specific to explore further. Because we now know the second argument is a <tt>CallExpr<\/tt>, we can use a <tt>callExpr()<\/tt> matcher to check the callee. The <tt>callee()<\/tt> matcher only works if we specify <tt>callExpr()<\/tt> instead of <tt>expr()<\/tt>:<\/p>\n<pre>clang-query&gt; m callExpr(hasArgument(1, callExpr(callee(functionDecl().bind(\"func\"))).bind(\"a2\"))) \r\n\r\nMatch #1: \r\n\r\nBinding for \"a2\": \r\nCallExpr &lt;testfile.cpp:15:20, col:29&gt; 'int' \r\n|-ImplicitCastExpr &lt;col:20&gt; 'int (*)(int, int)'\r\n| `-DeclRefExpr &lt;col:20&gt; 'int (int, int)' 'add'\r\n|-IntegerLiteral &lt;col:24&gt; 'int' 42 \r\n`-ImplicitCastExpr &lt;col:28&gt; 'int' \r\n  `-DeclRefExpr &lt;col:28&gt; 'int' 'i' \r\n\r\nBinding for \"func\": \r\nFunctionDecl &lt;testfile.cpp:1:1, line:4:1&gt; line:1:5 add 'int (int, int)' \r\n... etc \r\n\r\n1 match. \r\nclang-query&gt; set output diag \r\nclang-query&gt; m callExpr(hasArgument(1, callExpr(callee(functionDecl().bind(\"func\"))).bind(\"a2\"))) \r\n\r\nMatch #1: \r\n\r\ntestfile.cpp:15:20: note: \"a2\" binds here \r\nreturn add(argc, add(42, i), 4 * 7); \r\n                 ^~~~~~~~~~ \r\n\r\ntestfile.cpp:1:1: note: \"func\" binds here \r\nint add(int num1, int num2) \r\n^~~~~~~~~~~~~~~~~~~~~~~~~~~ \r\n<\/pre>\n<h2>Avoiding the Firehose<\/h2>\n<p>Usually when you need to examine the AST it will make sense to run clang-query on your real source code instead of a single-file demo. Starting off with a <tt>callExpr()<\/tt> matcher will result in a firehose problem &#8211; there will be tens of thousands of results and you will not be able to determine how to make your matcher more specific for the lines of source code you are interested in. Several tricks can come to your aid in this case.<\/p>\n<p>First, you can use <tt>isExpansionInMainFile()<\/tt> to limit the matches to only the main file, excluding all results from headers. That matcher can be used with <tt>Expr<\/tt>s, <tt>Stmt<\/tt>s and <tt>Decl<\/tt>s, so it is useful for everything you might want to start matching.<\/p>\n<p>Second, if you still get too many results from your matcher, the <tt>has Ancestor<\/tt> matcher can be used to limit the results further.<\/p>\n<p>Third, often particular names of variables can anchor your match to some particular piece of code of interest.<\/p>\n<p>Exploring the AST of code such as<\/p>\n<pre class=\"\u201clang:cpp\" decode=\"true\u201d\"> \r\nvoid myFuncName() \r\n{ \r\n  int i = someFunc() + Point(4, 5).translateX(9);   \r\n} \r\n<\/pre>\n<p>might start with a matcher which anchors to the name of the variable, the function it is in and the location in the main file:<\/p>\n<pre>varDecl(isExpansionInMainFile(), hasAncestor(functionDecl(hasName(\"myFuncName\"))), hasName(\"i\"))\r\n<\/pre>\n<p>This starting point will make it possible to explore how the rest of the line is represented in the AST without being drowned in noise.<\/p>\n<h2>Conclusion<\/h2>\n<p><tt>clang-query<\/tt> is an essential asset while developing a refactoring tool with AST Matchers. It is a prototyping and discovery tool, whose input can be pasted into the implementation of a new <tt>clang-tidy<\/tt> check.<\/p>\n<p>In this blog post, we explored the basic use of the <tt>clang-query<\/tt> tool \u2013 nesting matchers and binding their results \u2013 and how the output corresponds to the AST Matcher Reference. We also saw how to limit the scope of matches to enable easy creation of matchers in real code.<\/p>\n<p>In the next blog post, we will explore the corresponding consumer of AST matcher results. This will be the actual re-writing of the source code corresponding to the patterns we have identified as refactoring targets.<\/p>\n<p>Which AST Matchers do you think will be most useful in your code? Let us know in the comments below or contact the author directly via e-mail at <a href=\"mailto:stkelly@microsoft.com\">stkelly@microsoft.com<\/a>, or on Twitter <a href=\"https:\/\/twitter.com\/steveire\">@steveire<\/a>.<\/p>\n<p>I will be showing even more new and future developments in <tt>clang-query<\/tt> at <a href=\"http:\/\/codedive.pl\/index\/speaker\/name\/stephen-kelly\">code::dive<\/a> in November. Make sure to put it in your calendar if you are attending!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is part of a regular series of posts where the C++ product team and other guests answer questions we have received from customers. The questions can be about anything C++ related: MSVC toolset, the standard language and library, the C++ standards committee, isocpp.org, CppCon, etc. Today\u2019s post is by guest author Stephen Kelly, [&hellip;]<\/p>\n","protected":false},"author":890,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[563,512],"tags":[],"class_list":["post-20953","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-clang","category-general-cpp-series"],"acf":[],"blog_post_summary":"<p>This post is part of a regular series of posts where the C++ product team and other guests answer questions we have received from customers. The questions can be about anything C++ related: MSVC toolset, the standard language and library, the C++ standards committee, isocpp.org, CppCon, etc. Today\u2019s post is by guest author Stephen Kelly, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/20953","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/890"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=20953"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/20953\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=20953"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=20953"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=20953"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}