Programmatically parsing Transact SQL (T-SQL) with the ScriptDom parser

Arvind Shyamsundar

Arvind

.NET Developers are perhaps familiar with libraries like Roslyn and CodeDOM, which allow in-depth analysis of code (amongst many other things). Java developers use tools like SpotBugs for code analysis. As a SQL developer, perhaps you might have wondered if there are any equivalent functionality for Azure SQL and SQL Server code? Well, wonder no more!

Over the years, I have leveraged the very useful TransactSql ScriptDom .NET library, to programmatically parse Transact-SQL (T-SQL) statements, expressions, and scripts. This library accurately produces an Abstract Syntax Tree (AST) representation of the T-SQL code. Once you have such an AST, you can “walk” the tree to analyze it for various code patterns. As an example, you can use it to detect in a robust way if you have any DML statement other than SELECT – which may be useful to detect some forms of SQL injection. You can even change the AST if needed and produce a modified T-SQL script with the re-written tree. The possibilities are endless!

Use Cases

I am aware of several interesting use cases where this parser has been leveraged:

In addition, members of the SQL community have some great articles showing practical usage of ScriptDom / SQLDOM to do useful tasks:

Forgive me if I missed some useful examples – my attempt was not to list all possible related ones, just recent ones I am familiar with. Please do leave comments if you feel there is a relevant example which should be linked.

Within the realm of official Microsoft releases, ScriptDom is also one of the core building blocks for DacFx and associated tooling like SqlPackage. It is also part of the SqlServer PowerShell module, and many other projects, both at Microsoft, and in the broader community. This library is immensely powerful, and I believe that all SQL developers should be aware of the power and scope it provides.

Get Started!

If you are interested in learning about how you can use this library, my GitHub repo has some quite simple examples to get started. Once you add the necessary references to ScriptDom, you can instantiate a TSqlParser class specific to the target version of SQL that you are working with. Currently for Azure SQL and SQL Server 2019, that would be the TSqlParser150 class:

            using (var rdr = new StringReader(textBox1.Text))
           {
               IList<ParseError> errors = null;
               var parser = new TSql150Parser(true, SqlEngineType.All);
               var tree = parser.Parse(rdr, out errors);
...

BTW, this parser now supports many constructs of the T-SQL syntax specific to Azure Synapse Analytics as well. For example, it can parse CTAS statements! Anyways, once the parser produces the AST, you can use the well-known Visitor pattern to walk the tree. Usually this is done by implementing a class which can then store state as the tree is walked:

    class MyVisitor : TSqlFragmentVisitor
   {
       internal bool containsOnlySelects = true;
       public override void Visit(TSqlStatement node)
       {
           if ((node as SelectStatement) is null)
           {
               containsOnlySelects = false;
           }

           base.Visit(node);
       }
   }

And then, in the main code, we call .Parse(), thereby creating an instance of the “walker” class and walk the AST:

      MyVisitor checker = new MyVisitor();
tree.Accept(checker);

Video

To know more, you can examine the samples on my repo. You can also watch a detailed presentation on this topic done to the PASS AppDev virtual chapter session: T-SQL Swiss Knife using the ScriptDom T-SQL Parser by Arvind Shyamsundar. And I have blogged in the past about usage of this parser on my personal blog, including using ScriptDom to format T-SQL etc.: Arvind Shyamsundar’s technical blog

ScriptDom in SQL Projects (Code Analysis)

ScriptDom being a full-fidelity parser, is great for statically analyzing T-SQL code to proactively identify anti-patterns. Historically, we have had Code Analysis capabilities within Visual Studio SQL Projects. Visual Studio SQL Projects allow you to extend the built-in code analysis rules and implement your own custom code analysis rules. Here are some examples of OSS projects for custom code analysis within Visual Studio SQL Projects (earlier known as “SSDT”). They all leverage ScriptDom:

BTW, SQL Projects is recently also available as an preview extension for Azure Data Studio. Also, a related community-led project which allows developers to build DACPACs without Visual Studio is rr-wfm/MSBuild.Sdk.SqlProj. While this project does not use ScriptDom directly, it shows how to leverage the broader DacFx library.

Again, forgive me if I missed some useful code analysis extension which uses ScriptDom. Please do leave comments if you feel there is a relevant post which should be linked.

Conclusion

The parser is immensely powerful if you are interested in programmatically parsing T-SQL. Since the parser is an official release from Microsoft, and largely based on the same grammar that the core SQL engine uses, it is the most full-fidelity choice for parsing T-SQL currently available. And best of all it is freely available, as part of the DacFx NuGet!

1 comment

Comments are closed. Login to edit/delete your existing comments

  • Avatar
    伟 彭

    gudusoft.gsqlparser support manay sql scripts, not only the T-Sql