Using GritQL to capture code conversion metrics
As a codebase ages it’s not uncommon for new paradigms, frameworks, patterns and concepts to emerge and an architectural call to be made on whether or not to migrate to the new shiny thing.
On large codebases with many developers working on it, the logistics of such a migration are near impossible to do it in one merge. The pull request would be hard to focus on as there would be so many files changed that there would be a risk of the reviewer just glancing over it instead of paying attention to each file.
You’d also need to ensure that no one introduces merge conflicts which most likely would require stopping others from merging before you merged and then those people would have to deal with the changes you introduced.
In order to make this process easier, companies often create a continuous improvement initiative around converting the code, where developers working on files will be expected to change existing code that uses the old pattern to use the new pattern as they see them in the files they change during other development work.
This approach can be hard to maintain however as when deadlines loom on teams they often see the refactoring as something that could impede their progress and so things get missed or negotiated in order to get the promised delivery out.
One technique is to create a measurement of the code conversion effort, to capture the number of instances of the old approach and the number of instances of the new approach and use this when reviewing the health of the codebase so the business have a visual representation of the progress on an initiative and cater their estimates for this work.
Measuring code
There are a couple of approaches one can take when measuring code, the most common I’ve seen is to use a “find in file” approach and use something like regex against the contents of the source code files in order to see how many times certain text patterns are present in it.
An example of this would be to use something like ripgrep to run a regex over the content of all files in a directory and only return the names of files that contain lines that match the regex to give you a count of compliant / non-compliant files.
rg --files-with-matches [regular_expression]
This approach works well if the code pattern that your looking to detect is straight-forward enough to be captured by a simple regex string but most developers will shudder at the thought of having to debug regex.
Additionally this approach can be inaccurate if trying to detect code patterns that are split over multiple lines as simple things like adding comments to the code block could cause the regex to no longer consider the code a match.
A more accurate means of detecting code patterns is to use tools that allow you to query the Abstract Syntax Tree (AST) of the code as an AST query will still be accurate if comments, spaces and new lines are added to code block as you’re matching on code blocks containing a certain set of nodes in the AST.
One such AST query tool I discovered recently is Grit. Grit has it’s own query language called GritQL, built on-top of tree-sitter that allows you to write queries for matching against code patterns.
An example GritQL query (shamelessly stolen from their documentation) would be this query that matches against console.log statements in JavaScript that aren’t inside of try-catch blocks.
`console.log($log)` => . where {
$log <: not within `try { $_ } catch { $_ }`
}
To do something similar with a regex would involve a number of matches against parts of the code such as try { , } catch {, console.log( and then the closing bracket which would be quite fragile and hard to maintain.
Using Grit
The Grit setup consists of a couple of parts:
- The patterns which are run against the code
- The Grit CLI which runs the patterns against the code
- The Grit configuration file which tells Grit where to look for patterns (among other things)
https://docs.grit.io/cli/quickstart has details on how to install the Grit CLI and get it up and running for your project.
The Grit CLI has two commands you’ll likely be interested in, these are:
grit check
which acts as a linter, this is what I’ve been using for my code conversion measurementsgrit apply
which will apply any AST rewriting that the patterns define to the the code
The grit apply
command is really powerful as Grit has a standard library of code conversions that it can carry out against your code to help you migrate. Example migrations are converting React codebases from using class components to function components and converting a Cypress codebase to Playwright.
The grit check
command is really useful for getting a count of the number of matches against a pattern and if you pass it the — json
flag it will give you a JSON representation of the output that you can then pipe into other commands such as jq
. The output of Grit is on stderr
so you will need to ensure that stderr
is piped into stdout
for this to work.
Writing your own pattern
One thing that I absolutely love about Grit is how they’ve allowed you to use Markdown to define your own patterns. I’m a big fan of Markdown and keeping important decisions about the code in the codebase so it can be referenced and searched easily and being able to link to a Grit pattern makes communication about code conversion initiatives so much easier.
In order write your own pattern you need to create a new Markdown file under the .grit/patterns directory. Grit will use the name of the Markdown file as the name of the pattern so be sure to name it something that makes sense when reading the report.
The Markdown file formatting requirements are pretty casual but Grit does insist on a couple of things:
- The first codeblock in the file body will be used as the GritQL pattern that Grit will run
- Sub-headings with code blocks will be considered tests for the GritQL pattern
So with that in mind it’s probably best to not include any code blocks you intend to render things like Mermaid diagrams in Grit patterns and instead link to these in the description of the pattern.
For figuring out the query for your pattern Grit has a really nifty editor called Grit Studio that you can use at https://app.grit.io/studio .
Grit Studio has a couple of awesome features that help you explore the best GritQL query for the pattern you’re looking to define:
- The top editor is used to write the GritQL query to run and gives code completion and linting
- The bottom left editor is used for putting the code that you’re looking to query against and the parts of the AST that are matched by the GritQL query are highlighted
- The bottom left editor also has debug option that will show you the AST of the code in that editor, this makes it super easy to identify the AST nodes to query on as it will highlight the parts of the code as you hover over them
- The bottom right editor shows the resulting code if your pattern rewrites the AST
When I was writing a check for uses of Cypress outside of a class method the Grit Studio really helped me to hone in on the exact definitions for the query as I could write out a number of different function declarations (function, arrow function etc) and the class method and make sure that it only matched on the ones that were to be considered incorrect.
Using Grit to produce code conversion metrics
Grit provides a set of services around integrating its code migration functionality into CI so I would recommend you look at what they have to offer over writing your own but for small metrics like I’m using I can’t justify the cost so I have a little snippet that will allow you to take the output of grit check
and query that with jq
.
grit check --json 2>&1 | jq '[.results | map(select(".local_name" == "name_of_check"))] | length'
If have multiple checks it’ll be easier to save the Grit output to a file and read that but this would allow you to create a basic graph of different stats of your code conversion efforts.