Editor’s note: This is a cross-post written by Developer Experience Engineer, Marco Ippolito. Marco has his own blog at Medium .
Learn how to build a powerful code search tool using Orama
Traditional text editors or IDEs may offer basic search functionalities, but they often fall short of providing accurate and contextually relevant results. The main issue lies in their inability to specify where a variable is declared or to differentiate between different variable types like “ const ”, “ let ”, “ function ,” or class properties. This limitation leads to wasted time and effort during the code navigation process.
In this article, we’ll learn how to build a powerful code search tool using Orama , an incredible full-text search engine.
Abstract Syntax Tree
An Abstract Syntax Tree is a powerful representation of a code file’s syntax and structure. It breaks down the code into a hierarchical tree of nodes, each representing a distinct syntactic element, such as functions, variables, expressions, loops, and more. This tree structure allows us to analyze the code’s context, relationships, and scope, providing a more accurate understanding of its components.
This code snippet showcases the transformation of the code into an Abstract Syntax Tree:
Here’s the AST representation:
The Abstract Syntax Tree may seem overwhelming, as even a tiny code snippet creates a large and intricate JSON representation. But don’t worry; we’ll simplify it, making it easy to work with and explore its potential.
Let’s get started
Before we proceed with the flattening process, due to the hierarchical nature of the AST, we focus our search on child nodes within specific properties. The fieldsToTraverse array includes the properties where we want to explore and search for further child nodes:
We also want a list of fields to pick from each node. The fieldsToPick array includes the properties we are interested in.
Now, we can create a function that traverses the AST to flatten it. This function recursively navigates through the hierarchical structure of the AST and converts it into a flat array representation. As it traverses the tree, it identifies nodes of interest ( fieldsToTraverse ) and extracts relevant information based on the specified properties ( fieldsToPick ).
The flattened array is then organized to preserve the relationships between nodes, making it easier to access and search for specific code elements.
We have created an array of objects that looks like this:
The additional properties like name , kind , value , type , and loc provide valuable information about each code element. For instance, name might represent the name of a variable or function, kind could indicate the type of declaration (e.g., " const " or " let "), and value may contain the value assigned to a variable. The type property specifies the type of node (e.g., " Identifier ", " Literal " , " MemberExpression ," etc.), and loc represents the source code location of the node.
Now we want to be able to search and analyze the entries based on their properties, identify specific code patterns, locate variable declarations, and analyze function calls.
Orama: the lightning-fast search engine
Orama is a fast, batteries-included, full-text search engine entirely written in TypeScript, with zero dependencies.
By providing our array of AST nodes to Orama, we gain the ability to execute complex queries on the data.
Let’s install Orama:
First of all, we create an Orama database by defining the structure of our data:
Then we insert our nodes inside the database:
And… that’s it! Once we’ve added our AST nodes to Orama, searching for specific information becomes straightforward!
Let’s use a more complex code snippet to perform our queries on:
The code snippet contains several variables with repeated names and different scopes, which can lead to confusion when searching using traditional search filters.
For instance, the variable sayHello is defined both as a property of the class SayHello , a parameter of its constructor, and a function, which can make it hard to track its usage and assignments.
Similarly, the variable greet is declared multiple times, both as a local variable inside the function sayHello and as a global variable outside the function.
Let’s try to use Orama to search the SayHello class declaration, by filtering for parentType :
Acorn provides us with the field loc , (location) within the AST nodes, this information allows us to determine precisely where each token appears in the source file.
We can also search for a variable specifying if we are looking for a “ let ” or a “ const ”:
This was a fun experiment where we used Orama and Acorn together to make it easier to explore and understand complex code. The combination of their features worked well, and it would be cool to create a plugin for Visual Studio Code using this system. With such a plugin, we could improve code search, analysis, and navigation, making coding a lot more efficient and enjoyable for developers.