U.S. flag

An official website of the United States government

Skip Header

2020 Census Data Review

Written by:

Many parts of the census are visible to the American people — the invitations and reminders to participate arriving in the mail, the multilingual advertisements, the outreach from community leaders, and census takers knocking on doors to collect responses.

Today, we’ll continue to talk about some of the less visible work that is no less essential to producing quality census results — the people doing the number crunching and poring over trends, maps and outliers to make sure that the raw data on our population become quality statistics.

For the 2020 Census, we are conducting one of the most comprehensive reviews in recent census history. Through this review we have identified and fixed data processing issues, made efforts to correct for data collection issues, and have started to develop an understanding of how the nation’s population has changed over the last decade.  

Data Processing Recap

Before we talk about the review, we’d like to provide a quick summary of how data processing works. (The recent Census Data Processing 101 blog describes this in more detail.)

We attempt to get information for every address, even if this means accepting multiple responses for the same people or location. We talk more about why this might happen in the blog Getting a Census Taker Visit When You’ve Already Responded.

Once data collection is complete, we:

  • Organize and standardize the data.
  • Make sure the data are linked to the correct geographic location.
  • Sort out duplicate responses.
  • Correct invalid responses, such as someone’s reported date of birth not adding up to their reported age.
  • Fill in missing information using a statistical method called imputation.  (We discussed this technique more in a recent blog.)

Each phase of data processing produces a data file. Our subject matter experts review the data to identify systematic or widespread issues in the data collection or processing. If we identify issues that need to be fixed, we develop fixes and test them. After we implement a fix, we review the data file again and continue the process until all issues identified have been addressed. We talk through this process more in the recent Finding ‘Anomalies’ Illustrates 2020 Census Quality Checks Are Working blog.

These processing and review steps are typical in every census, and the 2020 Census is no different.

Using Technology to Quickly Identify Potential Issues

While the size of each successive data file varies, they are all large, even by census standards, and include hundreds of millions of records with many variables.

Thanks to improvements in hardware and software this decade, we have been able to rapidly turn these large data files into the diagnostics (metrics that help us diagnose any potential issues) and tabulated results we need for review. By generating this output more quickly, we’ve been able to spend more time reviewing the results.

To help us review the massive amounts of data in each file, we load the files into visualization and analytical software. One of the tools we used is one we developed for the 2020 Census. We call this tool the Census Review, Analysis, and Visualization Application or “CRAVA” for short. CRAVA has greatly enhanced our ability to conduct a thorough review.

We load tabulations from each new file into CRAVA. CRAVA then generates graphics and maps that display the census population totals alongside benchmarks, such as the 2010 Census, the 2020 population estimates, and the American Community Survey. This allows teams of reviewers to simultaneously search for outliers and patterns in the results.

Subject Matter Experts Resolve Potential Issues

While the technology provides the window into the data, it is the subject matter experts at the center of the review who must make sense out of what they are seeing. They build the review systems, design the review processes, and possess the knowledge and experience that makes the review possible.

We have a team of more than 70 subject matter experts systematically reviewing files from each stage of processing. These experts — many of whom have worked on two or more censuses — have deep experience and knowledge of census operations, geography, and the demographic characteristics and trends measured in the census.

  • They work throughout the decade in their subject matter area and possess the experience and knowledge to dig deep into specific topics, such as race, Hispanic origin, sex, and age, to verify that the data were processed correctly and to identify potential data collection issues.
  • They also possess the experience necessary to review data across the nation down to small geographic levels looking for outliers and patterns in the data.
  • Many of these same people helped design and build the procedures and systems used to conduct the 2020 Census.

Their expertise is what makes it possible to identify potential issues with data processing and collection and determine if they are true issues or if they are accurate measures of demographic change.

  • Example of an Issue — Early in data processing, we caught that a process designed to standardize age, which is used in later stages of processing to identify duplicate people, wasn’t working properly in some situations. Because subject matter experts design both the systems used to process the data, as well as diagnostics to determine if the process works correctly, the expert who designed the process quickly identified the issue, and we fixed it.
  • Example of Demographic Change Not every outlier in the data turns out to be an issue. Again, some reflect real changes in the population and do not require a fix. For example, when one town’s population appeared significantly lower than the estimates, a closer look showed that a decrease in the town’s prison population was behind the town’s decrease. Upon further research, we determined the decrease was consistent with the prison’s move of part of its population that occurred prior to April 1, 2020 (the reference day for the census), because of the pandemic. No further action was taken.

The focus of the review is more on finding systematic or widespread issues than issues with an individual group quarters or specific to one area. If a pattern of counts is higher or lower than the benchmarks, then we dig deeper.

We look to see if the pattern can be connected with a census operation or process to determine the underlying issue, and then we develop a corrective action, if needed. If we find issues unique to a specific group quarters or area, such as when a group quarters is in the incorrect location, we also correct those. 

The recent blog Finding ‘Anomalies’ Illustrates 2020 Census Quality Checks Are Working describes the types of anomalies we’ve found and provides several examples, including how we resolved them. The 2020 Census Group Quarters blog describes a more widespread issue with group quarters that we identified during our review and addressed through additional data collection and processing efforts.

Finally, an upcoming blog will discuss additional efforts to remove duplicate people after our review revealed high levels of duplicate census responses that had not been removed through other processes.

The Review Continues

We have now completed the third phase of data processing — reviewing the data file known as the Census Unedited File (CUF). The CUF includes the final population count for each address in the census, which we add up to determine the population for each state. Those state population counts are used in apportionment to calculate the number of representatives each state has in Congress.

We call the CUF the “Census Unedited File” because we have not yet conducted the final steps of applying edits and imputations for filling in missing characteristics, such as age, sex, race and Hispanic origin. Those steps happen in the next phase as we create the CEF, or the Census Edited File.

The CEF will include complete demographic information for every census record and will provide the first opportunity to conduct an in-depth review of the characteristics information collected in the census. The CEF is a crucial step toward providing the redistricting data that the states await.

As we prepare for the April release of the first results from the 2020 Census, we do so with the confidence that we have thoroughly reviewed the population counts and have made every reasonable effort to resolve the issues we’ve identified to ensure the quality of these results.

By combining substantial subject matter expertise with powerful analytical tools, we are conducting one of the most comprehensive reviews in recent census history.


Back to Header