Using Tree Sitter to extract insights from your code and drive your development metrics
Technology, but especially Software development is a constantly changing field. These changes can come about organically within the team as it discovers more about the problem they’re solving and iterates over their solution or come from external sources such as new paradigms recommended by the frameworks the team are using.
For an engineering team to keep the overhead of these changes low there needs to be a means to measure the health of the codebase. These are often called Development Metrics and are compiled by extracting “facts” about the source code that can be quantified and used to drive the team to increase or decrease that quantity.
An example of this would be a team who created a React based product a number of years ago and used the state management paradigm that was considered best practise at the time — Redux and Higher Order Components. In subsequent years what was considered best practise changed, with Hooks and Contexts replacing those older paradigms.
When faced with this change the team can respond in a couple of ways:
- They can focus on re-aligning the codebase with the new patterns at the cost of working on other changes to the code such as feature work
- They can denounce the new patterns and stick to the paradigm they know, accepting that there may come a point where they can no longer update the libraries they’re dependent on because of this choice
- They can create a plan to re-align the codebase as they work on other work, refactoring the existing code they touch as they build other things
From my experience it’s often the latter two approaches that are used, with the second one being more common than I’d hope. Teams that adopt this approach essentially bury their head in the sand until they’re forced to adopt the changes or their solution needs a complete rewrite as it’s considered legacy code compared to the rest of the industry.
Teams with a healthier relationship with their “technical debt” choose the third approach. These teams understand that their code will never be “perfect” and build tools to manage the fluctuations in alignment with internal and external patterns.
One of those tools being Development Metrics.
Building Development Metrics
In order to create a Dev Metrics you need to be able to quantify your code, but how do you turn a bunch of magic words that tell a computer to do things into numbers you can plot on a graph? — You define patterns and see how much of your code adheres to them.
A pattern can be something simple like ensuring that you have semi-colons at the end of your lines of code or it can be complex, checking that your code uses the same values as an external resource to ensure they’re both in-sync.
To create a pattern you need to be able to analyse your code. There are a number of different ways you can do this:
Textual matching
One approach is to match against a pattern of text in your code, often using something like a Regular Expression (Regex). Useful for checking or extracting simple string values but can give false positives/negatives if there’s inconsistencies in the way the code is written.
This type of check has the benefit of being quick and if you can deal with a certain level of inaccuracy then it can provide a lot of value quickly. But there is a trade-off as the more accurate you attempt to make the Regex the more complex it will get and the less readable and maintainable it will become.
Take for example the following check to match on a particular method being called.
^.*\(.*\).*=>.*cy\..*\(.*$
It will match perfectly against the following code:
const someUseOfCypress = () => cy.find(".selector")
But if we at a later point need to add a line before that code in order to log something we then break the check.
const someUseOfCypress = () => {
cy.log("This is going to check something")
cy.find(".selector")
}
And then we’d have to amend the Regex to handle both the inline arrow function and the multi-line arrow function which becomes very hard to read.
Syntax linting
Another approach is to use a linting tool such as eslint or Grit to define a pattern based on the syntax of the code. These linting tools will hold a representation of the code that can be queried and used to produce some output for the linting tool to report.
To re-use the method check from above, in Grit this would look like matching against the body of a const, var or function declaration to see if there’s a use of an method of the “cy” instance inside of it.
engine marzano(0.1)
language js
or {
lexical_declaration($declarations) where $declarations <: contains member_expression(object="cy"),
variable_declaration($declarations) where $declarations <: contains member_expression(object="cy"),
function_declaration($body) where $body <: contains member_expression(object="cy")
}
Linting tools usually allow you to define a severity level for matches against it’s rules so you can use the INFO
level as a means to have it report the matches without having it treat code that matches that pattern as an error.
Depending on the options available as part of the linting tool you may find you have to take additional steps to get the count to power your development metric. This could involve reading the number of lines spat out into the tool’s output when it’s run or parsing a serialised output.
Grit offers a JSON output which makes it a good tool for collecting dev metrics with as you can pipe that into a tool like JQ or run it as a subprocess and read the stderr and parse that JSON for further calculation.
# get the count of items that match a certain GritQL check
grit check --json 2>&1 | jq '.results[] | select(.localname == "CHECK_NAME") | length'
Syntax linting is more accurate than textual matching as it is more resilient to the structure of the code changing so a line break or additional lines of code in a function won’t impact the matches.
The downside to using a structural linting tool is that you’re only able to collect metrics on how well the syntax of your code meets a defined pattern, you’re not able to extract values from the code unless you do additional processing.
Abstract Syntax Tree traversal
Another approach is to use the underlying technique used by the linting tools — Abstract Syntax Tree (AST) traversal to query and extract values from your code. The syntax linting tool Grit is built on top of Tree Sitter.
Unlike linting which has you define a query and get either a count or list of matches, traversing the AST allows you to extract the nodes that match a predicate and do further calculation on them.
If we again re-use the method check example we can use a Tree Sitter to match against the method but also use the node to get the value passed into the arguments ( @selector
).
(
(call_expression
function: (
member_expression
object: (identifier) @instance
property: (property_identifier) @method
)
arguments: (
arguments (string (string_fragment) @selector)
)
)
(#eq? @instance "cy")
)
We can then use those values to do further analysis, say for instance we want to check that the values passed into the method are present in an external spreadsheet of values, we can do this and then produce a report on any missing values.
This was the exact use-case I found myself with.
Using Tree Sitter to extract values from code
I used Python to write my script but I think the process will be similar across all languages.
The first step was to install Tree Sitter and the Tree Sitter Grammar for the language the code was in. The code I was analysing was in TypeScript so I needed to install tree-sitter
and tree-sitter-typescript
.
With the libraries installed I then had to figure out how to run a query against my source code. The Tree Sitter docs really don’t help much in this regard but luckily there are some examples in the tree-sitter
libraries Github that show how the Python library can query against a byte-string, so I adapted that to read a file from disk as a byte-string and that worked.
# import library
from tree_sitter import Language, Parser
import tree_sitter_typescript
# setup parser
language = Language(tree_sitter_typescript.language_typescript())
parser = Parser(language)
query = language.query("""
(
(call_expression
function: (
member_expression
object: (identifier) @instance
property: (property_identifier) @method
)
arguments: (
arguments (string (string_fragment) @selectors)
)
)
(#eq? @instance "cy")
)
""")
# either read file as byte-string or create a byte-string
source_code = b"""
const someUseOfCypress = () => {
cy.log("This is going to check something")
cy.find(".selector")
}
"""
# run query
source_code_tree = parser.parse(source_code)
captures = query.captures(source_code_tree)
Once I had a means to run the query against the source code I then needed to write the query. This was a big learning step as Tree Sitter’s S-Expression query syntax uses Polish Notation which is something that always catches me out.
Essentially when building a query you create a expression that will match against a node, and you can have nested expressions to ensure that you’re only matching on nodes that have children that meet a predicate. You can then capture those nodes and outside that expression evaluate the nodes further.
The output of that query will be a key-value pair of the capture and the list of nodes in the AST that match the query.
You can then traverse the node’s tree to extract values from the AST, in my case it was to extract the arguments passed to the method being called.
# get the captures
captures = query.captures(source_code_tree)
selector_nodes = captures["selectors"]
# for each capture walk the tree and get the value
selectors = []
for selector_node in selector_nodes:
cursor = selector_node.walk()
if cursor.node:
selector_name_node = cursor.node.child(1)
if selector_name_node:
selector_name_text_node = selector_name_node.child(1)
if selector_name_text_node and selector_name_text_node.text:
selectors.append(select_name_text_node.text.decode("utf-8"))
I could then save values into a set and compare that set to another in order to analyse which values were missing.
selectors_in_code = set(selectors)
selectors_in_external = set(selector_strings_from_external_resource)
missing_from_external = (selectors_in_code - selectors_in_external)
missing_from_source_code = (selectors_in_external - selectors_in_code)
Some gotchas
AST traversal isn’t perfect, when analysing a file that is importing constants or types from other files you’ll not be able to access those definitions without additional static processing or dynamic runtime processing.
// Tree sitter won't know what SomeFancyType is so can't determine what
// properties are on "typedArg" but can tell you that it's calling
// cy.find with the selector property of typedArg
import SomeFancyType from './types'
const someUseOfCypress = (typedArg: SomeFancyType) => {
cy.log("This is going to check something", typedArg)
cy.find(typedArg.selector)
}
There are systems that allow for this, these systems parse your entire codebase and create “facts” about the code. Usually to remain performant these systems don’t store the AST itself but instead store the most commonly assessed facts about the code.
So you’ll need to assess your needs to decide if just the AST will do the job or you need something bigger.
Summary
Using Tree Sitter I’ve been able to extract the values used in my code to check them against a spreadsheet used by my non-technical colleagues and provide valuable feedback to ensure that all parties are aware of new analytics event names added to either the code or that spreadsheet.
I’ve been able to compile this a dev metric also so we can identify the incidents where things fall out of sync so we can find ways to decrease the amount of times this happens.
By learning more about how Tree Sitter works and using it to extract “facts” from the code I’m starting to build up a basic knowledge of how as a developer I can create searches that answer questions about what the code does for non-technical people.