Thoughts on the Visual C++ Abstract Syntax Tree (AST)
Hello, my name is Jason Lucas and I’m a senior software development engineer working on the Visual C++ front-end team. I noticed there were some questions (ok, maybe one question) about getting access to our compiler’s AST and, as this is an area I’m actually working on, I thought I’d talk a little about it.
For those of you who might not be up on compiler jargon, an AST is an abstract syntax tree. Given a set of grammar rules, it’s possible to break down a string into a branching structure which categorizes the parts of the string according to their grammatical roles. Those of you who remember diagramming sentences in school are already familiar with this concept. An AST is essentially the same thing, a tree-like diagram of the meaningful content of a program.
The part of the compiler responsible for producing the AST is called the front-end. That’s where all the grammatical rules of C++ are interpreted and applied to the incoming source code. The front-end also enforces the semantic rules of the language, making sure that the resulting AST is not only well-formed but also free of nonsensical content.
Once the front-end has produced a structurally and semantically consistent AST, the back-end can begin work. The back-end of the compiler consumes the AST, interpreting its shape and content and producing the necessary machine language to realize the program’s function. The back-end is largely target-specific, meaning that it is designed to produce the native language of a particular kind of machine.
This separation of responsibilities between the front- and back-end of the compiler allows for a good separation of concerns in its design. The front-end components don’t have to worry about specifics of the target chip set and the back-end components don’t have to worry about C++ language issues. The skills required to develop front- and back-ends are also fairly different. Here on the Visual C++ project, the front- and back-ends are actually produced by two different teams.
The AST is the intermediate point at which the two halves of the compiler rendezvous. It contains all the meaningful content of the user’s program, broken down and rearranged so as to be easily interpreted. As such, it is an excellent resource not just for the back-end but for anyone interested in analyzing C++ programs. Gaining access to the AST would make it much easier to write many different kinds of tools.
The sad fact of the matter is, however, that the current Visual C++ compiler doesn’t really generate a complete AST. It’s what’s known as a bottom-up compiler, meaning (among other things) that it devours its AST as it produces it, leaving no durable form behind. This is an artifact of the compiler’s age. In the days of the 256K limit, a large, in-memory structure such as a whole-program AST was not feasible.
As my colleague, Jonathan Caves, said in his recent blog post, the Visual C++ front-end team is currently engaged in revamping our codebase. One of the main goals of this effort is to be able to produce a good, durable AST. We are (and I in particular am) currently experimenting with a new API that will make a complete AST available to users outside of the back-end. Please note that this development effort will be delivered in a post-Orcas release.
The first group to benefit will be the IDE team. Code-aware features in the IDE, such as the syntax coloring engine, the IntelliSense engine, and the class browser, are today not AST-based, even though they clearly should be. I am currently working with the IDE team to make sure that the AST API is rich enough to support a new generation of IDE features. Our goal is to make sure that C++ programmers have at least as good an experience in the IDE as their C# colleagues currently enjoy.
A few other groups within Microsoft who are working on code analysis tools (such as the tools used in the Windows group to sniff out potential security holes) would also be early adopters of the new AST API. We want to use these groups as our first-round of testers (guinea pigs, if you will) to make sure that the new component is everything it should be.
After that, our goal is to make the AST API accessible to everyone. We want to include not only the AST but also some basic algorithms for walking it and interpreting it, simplifying its use as much as possible. We hope that this will foster the development of a new generation of powerful tools and development aids around the C++ language in general and Visual Studio in particular. We’d really like to see even non-compiler people be able to get involved and contribute to a flourishing ecology around our compiler.
I hope you found this discussion interesting. If you have any questions or ideas about the Visual C++ AST, feel free to post them here. I’d be happy to talk with you.