Fuzzy Snapshot Testing With Jq and Diff

News / Fuzzy Snapshot Testing With Jq and Diff

Fuzzy Snapshot Testing With Jq and Diff

By: Ryan Bell, Lead Engineer at Measures for Justice

At Measures for Justice, we’ve been busy with some big updates to our systems to help us scale as we expand our work to more communities, including the criminal justice agencies that serve them.

When we were first developing our national Data Portal, our resident data scientists and criminologists created a many-tabbed spreadsheet to keep track of everything that should be presented there – our measures, filters, bibliography, etc. To get things up and running quickly, we built code to read this spreadsheet, enshrining it as the “configuration brain” for our core products (you can probably see where this is going). Among other things, information from the spreadsheet was fed into a GraphQL API for our front end that provided detailed instructions about what to display.

As we grew and introduced more offerings, such as Commons, it became clear that we needed something more robust than a spreadsheet to manage all this configuration. We had multiple instances of it for different dashboards, and it was becoming hard for a growing team to collaboratively maintain.

What we chose as a replacement for the spreadsheet is a topic for another post, but we knew that the migration needed to be both iterative and as transparent as possible to downstream systems so that we could keep moving forward on our mission. (A previous effort to replace it via a “big bang” cutover had fizzled).

The challenge

In terms of our GraphQL API, this meant we needed a way to rapidly validate that each successive step in the migration hadn’t introduced any unexpected changes to query responses, so that our web apps would continue to function as before.

At the same time, we needed to allow some differences like:

Reordering of values in arrays (where only the presence or absence of a value is significant)
Comparison environments having slightly different sets of configuration due to unrelated work in progress that was making its way through testing and promotion
Unused, vestigial information being removed (GraphQL makes it easy to exclude object properties we no longer care about in a request, but we had other kinds of cleanup too)

To take a simplified example, we wanted comparison logic like this (applied across hundreds or even thousands of entities, with a lot more object nesting than this):

Accept the difference in ordering of `compatibleFilters` (it doesn’t matter)
Accept the addition of a new entry `5` in the `footnotes` list, because we know that our research team recently added it to this particular measure but it’s not yet made it to production
Flag the disappearance of the two values in `sources` as an unexpected change (hard to see in all the other noise!)
Accept the removal of objects in `availability` that indicate non-availability (turns out our frontend was filtering these out anyway, and eliminating them allowed us to avoid dragging some troublesome aspects of the old system forward)

Our first solution

We use Postman quite a bit – it’s a great tool for working on APIs! We initially tried writing JavaScript tests in Postman, something we’ve leveraged successfully for API response validation in the past. This worked, but had a few drawbacks:

Piecemeal checks: Since we didn’t expect exact alignment, a simple deep equality check was off the table. The result was a laborious comparison of individual object properties, stepping carefully around places where sub-objects could be null. This effectively meant lots of work to cover the 90%+ of the response that wasn’t supposed to have changed. It was hard to tell whether the coverage was comprehensive without cross-referencing constantly with the GraphQL schema.
Lots of nested loops to handle nested JSON objects
Failure analysis is tedious — we’d get an assertion failure (or an unexpected exception in the test code) and have to figure out where in a big nested collection it actually happened, e.g. by concatenating a bunch of IDs in every assertion description
Noise in the review workflow – we wanted to use our normal GitHub PR workflow to review and track our tests, which required exporting from Postman and committing the result. Postman’s export format is JSON files, with lots of Postman configuration surrounding the JavaScript test code, escaped inside JSON strings. This made for diffs that were hard to read.

A diff-erent approach

At some point we had a thought: “What if we compare this JSON data as text, instead of a big object tree?”

If we could somehow compare two formatted JSON responses from different versions of the service using text comparison tools, like diff, a single command would tell us if they matched, comprehensively. This seemed like an obvious solution if we expected no changes whatsoever.

But how could we deal with the fact that we didexpect some changes?

We’d been playing around a bit with jq, a command-line tool that bills itself as “sed for JSON data,” and an idea occurred: could we use jq to transform the API response data in a way that would surgically remove differences we wanted to accept?

If we did this iteratively, each transformation would be easy to review and understand as an isolated thing – accepting one type of difference at a time. We decided, as a best practice, to apply the same transformations to the responses from both the old and new system, to ensure we were treating them consistently.

Returning to our original example:

Let’s see if we can get jq to “factor out” the first difference, where the `compatibleFilters` array ordering has changed:

jq '.compatibleFilters |= sort'

This means: take the `compatibleFilters` property, run it through the built-in sort function, and assign the result back to `compatibleFilters`. The result:

Next, we can deal with the added footnote – recall that in this case, we only want to accept this change if we’re looking at one particular entity:

jq 'if .measureId == 123 then .footnotes |= map(select(. != 5)) else . end'

In English: if this is measure 123, filter the footnotes to values other than 5. Otherwise, return the object unmodified. Now we have:

Finally, let’s accept the removal of non-availability information. This is good illustration of why it’s helpful to apply all transformations to both the “before” and “after” side:

jq '.availability |= map(select(.available))'

This uses syntax we’ve already seen to filter objects in the `availability` array to only those where `available` is true. This leaves us with:

The remaining difference now stands out – one we didn’t expect. We can now go investigate and, as needed, fix whatever bug in the migration is causing us to lose those source associations. When we’re done, we can re-run the same filters with an updated “after” result and know instantly whether we resolved the discrepancy and didn’t introduce any others.

Super-charging it

In all these examples we’ve been looking at a single object. What happens when, in reality, we have a GraphQL response with a big array of these objects? How do we loop over them?

Conveniently, jq has built-in iterator “filters” that make a traditional loop unnecessary. To update the last example, we only need one small change:

jq '.[].availability |= map(select(.available))'

That initial `.[]` construct will cause our filter expression to iterate over every entity in the list and apply the rest of the transformation to each one, outputting the updated list.

For objects nested within arrays inside other objects, we can plop the same iterator into the middle of our jq expression without breaking a sweat.

To add the last piece of the puzzle, we can leverage diff tools in place of eyeball comparisons, especially as real API responses are much larger than the example. We used a graphical diff app as we went through each step, both to investigate differences found between old and new systems, and to verify the effects of each jq transformation.

Interactive comparisons handled, we wanted a way to automate it so a script could tell us if every difference was accounted for. By default the `diff` command will output a patch that shows what all the differences are:

diff before4.json after4.json
5c5
<   "sources": [8, 9],
---
>   "sources": [],

We can programmatically count the number of differences here by taking advantage of the fact that lines beginning with a number represent the start of a new difference:

diff before.json after.json | grep -c '^[1-9]'
1

Packaging it up

From here, it was a short leap to packaging this up as bash scripts that would make the API response validation process for each entity type transparent and repeatable. Each script outlined a series of transforms that looked something like this (with some of the repetitive bits eventually moved to helper functions imported via the `source` command):

# ---------------------
# difference: measure IDs have changed
#
# Some measure IDs had 2000 added to them to avoid conflicts 
#  with similar but different measure definitions for another 
#  product
# ----------------------

# from prior step:
# baseFile="results/02-old-converted-year.json"
# testFile="results/02-new-converted-year.json"
echo "Step 3: translate 2xxx measure IDs to prior values"

baseFileNew="results/03-old-original-ids.json"
testFileNew="results/03-new-original-ids.json"

filter='.data.measures[].id |= if (. > 2000) then (. - 2000) else . end'

# preserve intermediate artifacts for debugging

jq "${filter}" "${baseFile}" > "${baseFileNew}"
jq "${filter}" "${testFile}" > "${testFileNew}"

diffs=$(diff -d "${baseFileNew}" "${testFileNew}" | grep -c '^[1-9]')

echo "${diffs} differences"

exit ${diffs}

We could add steps successively, and once this script returned 0, we were done! It was even easy to automate the entire process, right down to making the initial queries (via curl).

Evaluation

Advantages of this approach compared to writing test scripts in Postman:

Check everything: by default, with very little effort, we were comparing absolutely everything about the response. We could subtract from there, meaning we only had to write logic against the small percentage of the config that had changed.
Abstracts away most iteration: jq’s iteration constructs are quick to use and result in minimal boilerplate
Failure analysis is visual: if the script reported there were > 0 differences, we just fired up a graphical diff tool to see exactly where the issue was
Easy code review: Succinct, yet heavily-commented, shell scripts are easy to review as a GitHub pull request and allow us to focus on verifying that the higher-level change acceptance logic makes sense.

For our purposes, this worked really well. The filter language for `jq` has some learning curve, but it’s well-documented and there’s a nice interactive playground.

A handful of other nifty jq filters we used:

// remove trailing whitespace to account for cleanup during migration
.data.measures[].sources[].text |= sub("\\s+$"; "")

// skip over a known issue in how SD counties were enumerated in new system
.data.states |= map(if .id == "SD" then del(.counties) else . end)

// translate representations of time across a couple of properties
// to account for certain stubs being replaced with real values
.data.measures[].availability |= map(
  if (.cohort.id == 3 and .sourceType == "PROSECUTOR") then 
   (. + {cohort: null, year: 2017})
  else
    .
  end
 )

Bash scripting is a less ideal language in most regards than something like JavaScript or Kotlin, but the ability to drive CLI tools with minimal code, and do most of the work there, outweighed this drawback for us.

Takeaways

A key to our second approach is the composition of text processing tools like diff that treat text generically, with format-specific tools, like `jq`, that can interpret the content of text files – and being able to move back and forth between them, leveraging each for its strengths.

This tooling has come in handy on several different occasions as we’ve iteratively deprecated parts of our spreadsheet. We’ve already been able to put much of the migration work into production and have achieved the ultimate measure of success for this kind of project – no one noticed!

Stay tuned to happenings at MFJ as we close in on migrating the last few pieces away from our spreadsheet. Our new, scalable multi-tenanted configuration system is just one part of our ongoing effort to bring Commons to communities across the country, including yours!