U.S. flag

An official website of the United States government

Skip Header

A Civic Digital Fellow Tackles State-of-the-Art Data Linkage Challenges

Written by:

Civic Digital Fellows are hired every summer to work on specific projects being led by Census Bureau staff. The fellows are held to the same high standards expected of all Census Bureau employees and are sworn to protect confidentiality and privacy. Maria was hired to work on the Census Bureau’s 2020 Administrative Records Project. In this blog, she discusses her experience as a fellow and her work on that project.

Person record linkage refers to the process of identifying unique individuals and combining the information from multiple data sources. It’s a common but complicated task that underlies much of the statistical work done at the Census Bureau and is the foundation of quality statistics.

Optimally linking records is a mathematically and computationally complex problem with repercussions beyond theoretical matters. There is seldom a consistent identifier to merge records on, and partial information (like name or address) does not always succeed in uniquely determining units (whether people, businesses or other entities). Plus, data may be missing (nonresponse) or contain errors.  

Probabilistic models are necessary to link records, but we can’t determine the true match status of record pairs (whether two records refer to the same unique entity) without knowing how the data and its associated errors are generated. Additionally, larger data sets require intensive computing resources even in cases where there are few true matches across sources.

The complexity of record linkage makes it an interesting topic of study from both a theoretical and applied perspective, and the Census Bureau has been a leading contributor to research and innovation on both fronts.

My Experience as a Civic Digital Fellow

How did a soon-to-be graduate student interested in studying record linkage get involved with the research at the Census Bureau? For me, it was Coding it Forward’s Civic Digital Fellowship (CDF). A first-of-its-kind program, CDF opens a pipeline into public sector work for students in technology-related fields. When CDF started in 2017, the Census Bureau was its founding agency partner, with 14 fellows from 11 universities; this summer, the Census Bureau welcomed its third and largest cohort of fellows — 20 students from 17 universities. 

All of us share an interest in the intersection of technology and governance, as well as a passion for public service. Fellows are making their mark at the Census Bureau working on projects ranging from applying machine-learning techniques on public data (to predict NAICS codes for the Economic Census) to implementing formally private algorithms to be used by researchers in future disclosure avoidance practice. 

For me, the CDF experience translates into studying response omissions by examining differences in individual nonresponse at the household level across multiple data sources. In particular, I link records and investigate how decisions made in the linkage phase — including choice of blocking schema, field agreement/disagreement probability parameters, and classification score threshold — impact source coverage distributions. 

This work contributes to downstream household demographic and economic analyses, allowing for the comparison of households with a composition discrepancy across the data sources (i.e., an individual appears in one data source, but not the other) against those without such gaps. My linkage work enhances the Census Bureau’s probabilistic matching of record information by using a broader set of observations to inform conclusions and a more complete universe of data to generate household and person statistics. 

Working on this project gives me the opportunity to think and learn alongside some of the leading researchers in this field and to contribute to current discussions about the future of record linkage research and practice at the Census Bureau. It also gives me the chance to interact with researchers, working both upstream and downstream, helping me understand the importance of this work and how record linkage fits into the Census Bureau’s broader statistical analysis pipeline. I have no doubt my fellow fellows have had similar opportunities in their respective project areas.

Lessons Learned From My Interview With the Deputy Director

The relationship between fellows and the Census Bureau is symbiotic — the Census Bureau also benefits from us being here.

I had a chance to talk with Deputy Director Ron Jarmin about the CDF program and its value to the Census Bureau. He said that past fellows have produced high-quality projects that allowed everyone involved to learn something and that the fellowship program provides an opportunity for future growth at the Census Bureau because fellows may continue their work beyond the duration of their fellowship.

When asked how the program fits in with his vision for the Census Bureau in the coming decade. “The production function of statistical information is changing [and] the training and tools that [Civic Digital] Fellows bring in are more in that direction,” he said.

Reflecting upon my experiences as a Civic Digital Fellow, I always find myself returning to the people at the Census Bureau rather than the particulars of the research that first drew me here. Simply put, the people I have met at the Census Bureau are awesome. They have been generous with their time and energy, and have done things big and small to support both me and my research. 

At the end of our conversation, I asked Deputy Director Ron Jarmin for advice on how to make the most of my time here. He shared a number of helpful tips, but one really stood out to me. “Success in this, like in many other endeavors, is all about collaboration,” he said. Great research is made possible by the presence of great people, and the Census Bureau is full of them. It is a privilege to spend 10 weeks researching in their company.

Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?


Back to Header