Adapting Population Estimates to Address COVID-19 Impacts on Data Availability

Written by:

By now, it is well known that the COVID-19 pandemic significantly delayed the availability of 2020 Census data. One program that depends on these data is the Population Estimates Program (PEP) for which the most recent census serves as the starting point for the next 10 years of our annual population estimates. 

Without the complete 2020 Census data with full demographic detail for that starting point, known as the estimates base, demographers in PEP devised a way to leverage data that were already available from the 2020 Census (namely, total population counts) with other data our program produces: older series of 2020 estimates and the 2020 Demographic Analysis (DA) estimates used as an official benchmark for the census.

By blending these data files, PEP demographers were able to create an estimates base with the high level of detail needed to produce information for the years beyond the 2020 Census. The blended base applies age and sex detail from the 2020 DA estimates to 2020 Census total population counts. From there, we adjusted Vintage 2020 (V2020) estimates for April 1, 2020, to be consistent with the DA/census mix while bringing in more demographic detail.

One way to examine the impact of these data sources is by using a population pyramid like the one below. The left half of the graph represents the total number of U.S. males by single year of age on April 1, 2020, in each data source and the blended base; the right half of the graph displays the same information for females. A spike is evident for both males and females at age 85 because this represents the sum of all ages 85 and over.

While a schedule delay was the impetus for creating the blended base, using the DA estimates introduced some interesting changes to the age and sex distributions in the final blended base relative to the 2020 Census results. It’s important to remember that when PEP demographers created the blended base, the 2020 Census age and sex detail that appears in the pyramid was not available.

2020 Census data by single year of age and sex have not yet been released publicly. A special tabulation of 2020 Census data at the national level by age and sex was produced for the March 10 release of the 2020 DA net coverage error estimates. This special tabulation features confidentiality protections applied using the 2020 Census Disclosure Avoidance System. The 2020 Census data shown in the pyramid below are from that special tabulation. 


One interesting change is the smoother appearance of the blue V2021 blended base line relative to the 2020 Census data. The spikes evident in the red 2020 Census line are the result of “age heaping” in the 2020 Census data, or higher values reported for ages ending in “0” and/or “5” (i.e., 30, 35, 40, 45) than would be expected to occur naturally. That is, age heaping is an artifact of the census that can occur when people report age for someone else and are more likely to pick these numbers.

On the other hand, current and historical birth and death records, data on international migration and Medicare enrollment records were used to produce DA estimates of everyone living in the United States on April 1, 2020. Since these administrative data were the primary sources used to create the estimates, we don’t see heaping in reporting play a role in shaping the distribution of the age data.

Another notable difference can be seen for the population ages 0 to 9. Young children, especially those ages 0 to 4, are a historically undercounted population in the census. However, the vital records system in the United States is very robust, meaning that most young children are captured in the birth certificates from the National Center for Health Statistics used to develop both the DA and V2020 estimates. Because of the methodological similarities used for these two input sources, the green V2020 and dashed orange DA lines for the younger ages in the population pyramid lie virtually on top of one another. Yet the red 2020 Census line falls below these sources for both males and females.

But because the blended base incorporates the 2020 DA age and sex distribution, it pulls the V2021 lines for ages 0 to 9 up to the level of the DA estimates — essentially having a mitigating effect on the known undercount for this population.

Interestingly, the impact of the undercount of children is not only evident for ages 0 to 9: the gap between the V2020 estimates and the other data sources for ages 10 to 19 reflects the undercount of children ages 0 to 9 in the 2010 Census carried forward in the estimates base from 2010 to 2020. By using the DA distribution, this undercount was not retained in the V2021 blended base.

Although the V2021 blended base population by age and sex differs from the 2020 Census population for April 1, 2020, the population pyramid illustrates that the differences are minimal for many ages. This is due, in part, to the use of 2020 Census totals in the blended base: the total population in the V2021 blended base matches the total population in the 2020 Census for counties and all higher geography levels.

The next release using the V2021 blended base is scheduled to take place in late June when the Census Bureau publishes the July 1, 2021, population estimates for the nation, states and counties by age, sex, race and Hispanic origin.

Moving forward, research is underway to test and implement possible improvements to the blended base for the Vintage 2022 estimates series, which will be released beginning in December 2022. In the meantime, as more 2020 Census data and coverage measures become available to PEP and additional 2020 Census demographic detail is potentially incorporated into the estimates base, we are also researching the feasibility of applying adjustments for known coverage issues.

Luke Rogers is chief of the Census Bureau’s Population Estimates Branch.

Christine Hartley is assistant division chief for the Estimates and Projections Area in the Population Division.




