U.S. flag

An official website of the United States government

Skip Header

Statistical Language Performance at the U.S. Census Bureau

Written by:
Working Paper Number ADEP-WP-2021-01


Large-scale data processing and analysis is not a new challenge for the U.S. Census Bureau, but the number of statistical programming languages and tools available to perform such work has expanded in recent years. We evaluate how statistical programming languages perform on a common data management task within the Census’s Bureau’s high-performance computing cluster. Specifically, we develop Python, SAS, Stata, and R scripts that merge the person, household, and geographic microdata from the full-count 1990 Census microdata files. We then use these merged data to perform basic analyses such as counting the number of individuals per household and calculating the average household size for every county in the U.S. We compare the different language implementations of these scripts based on runtime for each task. We find that there is wide variation between languages in runtime, and the speed of the programming language depends most heavily on the file format of the input data file.


Back to Header