5th January 2023
Abstract Syntax Tree
The AST acronym stands for Abstract Syntax Tree. It is a tree data structure representing any structured text file, so every standardized syntax can be represented through an AST.
Since the AST is “abstract”, it does not have a standard representation, because every language may have its own specific dictionary. However, the common concept shared across all the ASTs, is the tree representation where the first node describes the document/program’s entry point.
Generally, the first step to run a piece of software is to parse the source code and build an AST. This operation is called parsing and is performed by the parser component.
📄 Source ➞ ⚙️ Parser ➞ 🌲 AST
If the parser is unable to parse the source code, it will throw an error. The source code is invalid and the AST cannot be generated.
How the AST is used
The AST is generated to process source code files, but it can be generated from any text file such as Markdown, JSON, or even a GraphQL Document.
When the parser builds a valid AST, the following step is the transform:
📄 Source ➞ ⚙️ Parser ➞ 🌲 AST ➞ ⚙️ Transform ➞ 🌲 AST (Transformed)
The AST is manipulated and transformed during this phase to generate a new AST.
Some examples of transformations are:
- the babel tool that transforms the AST to generate a new AST that is compatible with the target environment
- source code minification, where the AST is transformed to remove unnecessary characters and to reduce the file size
- code formatting tools, where the AST is transformed to add/remove spaces and new lines to improve the code readability
Finally, the new AST is then passed to the compiler:
📄 Source ➞ ⚙️ Parser ➞ 🌲 AST ➞ ⚙️ Transform ➞ 🌲 AST (Transformed) ➞ ⚙️ Compiler ➞ 📄 Output
The compiler generates output such as:
- bytecode to be executed
- a new source code file derived from the original source code
- some console output such as a warning or an error or suggestions
All the tools we use in our daily work are based on the AST and are used to improve developer experience. Code completion, refactoring, linting and formatting are all powered by the source code tree representation! This is how our IDE and powerful editors implement extraordinary features!
Let’s focus on the Node.js runtime and see how the AST is built.
node index.js will throw the error:
Under the hood, Node.js relies on the Google V8 engine to parse the source code and build the AST. The tree representation is then passed to the Ignition interpreter that builds the final bytecode.
- acorn: the core module of many parsers
- espree: the eslint’s parser
- @typescript-eslint/typescript-estree: used by prettier
- …and a lot more
For example, the following code:
The following AST can represent it:
The AST standard initiative
In the initial section, we said that the AST is not standardized. Moreover, the AST generated by v8 is a tuned and optimized AST designed for its engine, so it is a tailor-made AST.
The specification maintained by the ESTree community follows the ECMAScript standard and its naming conventions.
A handy tool to explore the AST is AST Explorer.
You can copy-paste the source code and see the AST generated by the selected parser.
ℹ️ Navigate the AST
I find it very useful to explore the AST generated by a GraphQL schema when I need to implement a mercurius plugin!
Of course, an awesome list 🕶 of AST-related projects exists!
Now you should see your IDE and the
eslint tool with different eyes! There was no magic, just a tree data structure!
If you are willing to build an
graphql plugin, the information in this post will help you to start your journey!
If you enjoyed this article please share and follow me on Twitter @ManuEomm!