How Our Spring 2023 Interns Documented Our Data Pipeline
By: Senior Software Engineer William Stumbo and Software Engineering Interns Hana Ho, Daniel Rigoglioso, and Phil Ganem
Over the past couple years, Measures for Justice has hosted several software engineering interns. Our interns get first hand experience working on teams while contributing both their ideas and creativity. We get talented and enthusiastic individuals able to play key roles in helping us deliver software that meets our standards and stays true to our core mission.
This spring, our interns implemented a new internal tool that documents our data processing pipeline. Our pipeline transforms agency-provided data into measures that depict the operation of the local criminal justice system. Our new internal tool, the Data Element Catalog, captures the relationship between the agency data, our transformation steps, and the resulting measures. Our interns modeled these relationships in a Postgres database, built a REST interface to expose this information, and designed a user interface so we can explore their work.
Each cohort of interns has inspired us; their contributions have made Measures for Justice better. This month, we’re going to dedicate this space to both thank them for their contributions and to give them an opportunity to share their experiences. Read on to hear directly from our Spring 2023 interns!
Hana’s Internship
My name is Hana Ho, and I am a computer science student at the Rochester Institute of Technology. Before working at Measures for Justice, I worked at a health center as a data clerk. I joined MFJ to get experience with different types of data and gain exposure to operating in a smaller corporation. The beginning of my internship consisted of training to prepare for work on company projects. I started by learning Kotlin through Jetbrains Academy project exercises and exploring the Measures for Justice GitHub site. My previous experience with SQL helped me quickly become familiar with the Postgres database.
The other interns and I focused on creating a website that allows users to view dependencies between and descriptions of data elements in the database. Most of my assignments involved working on the backend using Jetbrains Exposed and Kotlin. Since we were building a completely new tool, all the API endpoints were created from scratch. I implemented the endpoints that return data elements and their dependencies. All endpoints allow specific elements to be retrieved by providing a list of ids, locations, and processing stages by which to filter. Once I got an initial version of these APIs working, I updated them to support pagination, provide documentation, and defend against SQL injections. Originally, I put all the API routes in a single file, but this quickly got unwieldy to maintain. I reorganized the routes across multiple files by leveraging Spring Boot to map HTTP requests to different classes and functions. In the future, I will be working on adding more API routes and features, such as user authentication and sorting.
Ultimately, my time at Measures for Justice has been extremely rewarding and enjoyable. From the beginning, everyone has always been welcoming and there were many opportunities to interact with employees from different departments. Although it was difficult to adjust to a work environment at first, I have always felt comfortable reaching out for help or to ask questions. I look forward to the rest of my time here and to continuing to improve and learn new skills.
Hana Ho, Software Engineering Intern
Dan’s Internship
MFJ has allowed me to enhance my coding and debugging skills by providing me the opportunity to work on the Data Element Catalog. Throughout this project, I’ve learned the language Kotlin, which is based on the popular Java programming language, and improved my fluency in SQL.
My first major contribution was modeling the dependencies between data elements in the catalog. I worked off a CSV file drafted by another team that intimately understood the data pipeline. I added a function that looked at each row of this file and cataloged the relationships between variables at different pipeline stages. The variables and their relationship types were inserted into a Data Element Dependencies table. I used Flyway scripts to manage the schema of this table and maintain some static groupings of these variables.
I also contributed to the Data Element Catalog API. I utilized Spring Boot to create endpoints that outlined how variables are grouped and prioritized across each pipeline stage. I added the ability to filter results by agency, so users interested in a single project can focus their exploration. Also, I expanded the endpoints outlining dependencies between variables at different pipeline stages. Initially, our API only returned a variable’s dependency in the prior stage. I created an endpoint that recursively traversed dependencies so that users could see dependencies in all prior stages. With this information, the data engineering team can figure out what data an agency needs to provide to implement a variable calculated towards the end of the pipeline.
In order to test our new endpoints, I created many JUnit tests that call them. I configured the tests to mock calls to the database and instead read query results from YAML files. The YAML files were smaller and more stable than results from the real database and made it easier to write assertions in the tests.
Daniel Rigoglioso, Software Engineering Intern
Phil’s Internship
When I initially started my internship with MFJ, I was tasked with making minor changes to existing systems to assist with the development of the Data Element Catalog. My first few stories were to create a series of Kotlin scripts that copied data into a Docker instance of the catalog. In doing so, aspects of local development became streamlined for the catalog team by allowing us to test with the data locally.
From there, my tasks migrated to creating a user interface for the catalog. Prior to joining the organization, the catalog could only be accessed in full via SQL query. I wrote new endpoints into the API using Kotlin and Jetbrains Exposed. The endpoints allowed us to grab essential information from the catalog so that I could begin development on a basic web interface. The new endpoints also allowed other teams to easily access information within the catalog without having to directly query the database.
With the newly created endpoints, I moved over to React to give the catalog some life. I created a landing page complete with a grid to list elements within the catalog, a file-tree-like navigation system to access different parts of the catalog, and an interface to view details about individual or groups of elements. I styled the interface with Tailwind CSS to give it a nice polished look.
After I had developed an initial prototype of our web interface, I moved towards researching avenues of deployment for it and the API. This research led to multiple cross-team discussions as to the best way to accomplish this task with our organization’s infrastructure. These talks guided me to adapt the API to serve a static build of the React code under a single domain. By accomplishing this, I was able to then containerize both the web interface and API in a single Docker Container. Very recently, I got the container deployed to Kubernetes so that it could be used internally by researchers and other engineers.
Phil Ganem, Software Engineering Intern