U.S. flag

An official website of the United States government

Skip Header


Technology Transformation at the Census Bureau: Building a Modern, Data-Centric Ecosystem

Written by:

Change Is Here

The U.S. Census Bureau has historically helped answer both simple questions like "What’s the population of Utah?” and more complex questions like “How are declining business start-up rates related to living standards?” We have done this by conducting censuses and surveys and publishing the results. But in these challenging times, U.S. residents hesitate to respond to surveys. Many are reluctant to open the front door or answer a phone call or text from an unfamiliar number. These are bread-and-butter issues for a survey-taking agency. Our changing culture and rapid changes in data and technology tell us that censuses and surveys alone, while still critical, can no longer answer society’s questions completely or quickly enough to satisfy the modern appetite for information. In this blog, I describe modernization efforts underway at the Census Bureau that combine data-science with traditional survey methods to diversify our data products and place data at the center of our approach. For more information, refer to the Integration Phase 1 Plan.

Building a Data-Centric Ecosystem to Handle the Change

We need to adjust our focus from managing surveys and censuses to managing an ecosystem of data collection, processing and dissemination designed to deliver the data products that best address the questions our data users have — both simple and complex. To build the foundation for this approach, the Census Bureau has created four integrated enterprise initiatives.

The Enterprise Data Lake

The Enterprise Data Lake (EDL) is the central hub of our modernization efforts from a data processing and computational perspective. Built in the cloud to allow for scalability and the use of a cloud-native software stack and modern processing tools, the EDL is the Census Bureau’s primary location for collected and ingested data. It also provides both analytical and operational processing capabilities to allow for a better flow between ongoing research and current operations. From the EDL, products can be created and published to our dissemination platform — CEDSCI (Census Enterprise Dissemination Services and Customer Innovation).

Frames

The Frames program is a growing variety of linked datasets within the EDL. While many of these datasets already exist as standalone entities at the Census Bureau, the Frames approach will collocate these and any number of curated datasets and provide an easy and efficient way to link them for purposes both familiar (e.g., providing a tailored survey frame) and unanticipated (e.g., answering a new question about jobs and COVID-19 vaccination rates). These linked, augmented and continuously updated datasets will provide a more comprehensive means for maintaining and updating the inventory of our nation’s addresses, jobs, businesses, people and other linked data. Centralization and “linkability” will increase efficiency, reduce duplicative efforts to maintain and manage data and greatly expand our capacity to answer critical questions about the nation’s population and economy at multiple geographic scales.

Data Ingest and Collection for the Enterprise (DICE)

Providing a modern platform for both data collection and ingest, DICE will be the key entry point for data into the Census Bureau for subsequent transfer, storage and use in the EDL. DICE will refresh the legacy field online and paper data collection technology with updated, flexible capabilities that reinforce the new operations and data ecosystem approach. DICE will also provide much needed functionality to interact with external data-ingest, frames and other modern data processing capabilities. DICE will leverage both operations research and data science techniques to enable more efficient operations and adaptive survey design. Finally, DICE will enable flexible scaling to support the diversity of the Census Bureau’s data collection operations, from rapid, lightweight surveys to the decennial census, without the need for costly updates or system rebuilds. Many of the key functions provided by DICE were developed and successfully deployed in the 2020 Census, providing a strong foundation for further development and use by the entire Census Bureau. Census Enterprise Dissemination Services and Consumer Innovation (CEDSCI)

As the Census Bureau’s primary platform for data dissemination, CEDSCI will provide public access to our information. As new data products are produced in the EDL with collected, ingested and linked data, CEDSCI’s standardized platform will allow the Census Bureau to provide them quickly. Allowing for discovery of data products and new visualizations and renderings of data, CEDSCI will provide a scalable solution for long-term data dissemination and better user experience. 

The Census Operations and Data Ecosystem

Integrating the four pillars described above into a unified enterprise approach to doing business makes it possible for the Census Bureau to provide easily discoverable and linkable data to accurately answer more questions faster than ever before. We have named this integrated system of systems the Census Operations and Data Ecosystem (or CODE). The building of CODE represents a key element in the Census Bureau’s strategy to anticipate and prepare for a world driven by and dependent on accurate, timely, relevant data.

For example, during the COVID-19 pandemic, data users of all types were hungry for information on the impact it was having on the people, systems, and economy of the United States. In this case, the Census Bureau was able to respond quickly after the pandemic hit by creating and fielding the Household Pulse Survey (HPS). This survey and others developed since continue to provide a much-needed view of key aspects of the pandemic. Despite the groundbreaking approach that delivered highly relevant data quickly, the HPS had some common limitations shared by many surveys, including low response rates and significant margins of error. In a future with CODE in place, the Census Bureau’s options for modernized data collection, storage, advanced linking of survey, census, and third-party data and modernized data processing will be greatly expanded. If a similar crisis were to occur, CODE would enable more robust, rapid and accurate data products – possibly without the need for a survey.

Similarly, CODE will provide myriad data linking capabilities using secure and confidential data sources for evidence -building, answering questions like:

  • Was poverty reduced where a government business incentive program was offered?
  • Which businesses relocated to the neighborhood or opened new establishments where incentives were offered?
  • Did the business(es) hire workers from within the neighborhood? When were they hired?
  • Did existing businesses hire additional workers? How have businesses’ revenues changed?
  • How many new businesses were established that address the needs of workers?
  • Where do individual residents of the community spend their money? Are dollars circulating within the community or are residents spending their money elsewhere?
  • If the poverty rate declined and median income increased, how is that related to gains by longer-term households or to higher-income households moving into a community (i.e., gentrification)?

Challenges

Building a modern, integrated, processing ecosystem via the four pillars presents many challenges. In our planning for this transformational effort, we have identified seven key challenges.

Silos of Excellence

The Census Bureau has long been a production agency with success tied to individualized methods in our separate directorates. The continued protection of segregated, specialized processing within our silos, while routine, familiar and safe, represents the single most challenging barrier to the vision and innovation that a unified, integrated operations and data ecosystem can make possible.

Management Vision and Commitment

Reflecting on the Silos of Excellence described above, the incentive structures at the Census Bureau have often provided a mission-focused argument for leaders in different areas to “wait on the sidelines” looking for signs of success or failure before committing to a new direction. If leaders take this approach, there is little chance of success for the four initiatives and a modernized operations and data ecosystem. 

Proprietary Technology

Over time, the Census Bureau has developed a portfolio of IT systems based largely on closed-source, proprietary technologies. These systems have been successful but often come at a heavy cost – both for licensing and for the limitations they can impose on internal Census Bureau innovation.

Technology and Data Science Skills Deficit

The types and levels of skills and expertise needed to operate and maintain our current-state systems are not easily transferable to the new, cloud computing approach critical to a data-centric ecosystem. It will be critical to adopt new cloud technologies, data science capabilities and skills to implement them at scale for high-end computing, processing, data storage and analytics. 

Rising Costs

As noted, Census Bureau systems are largely siloed by project or directorate, often resulting in duplicative systems supporting the same or similar capabilities, leading to increased overall IT costs.

Cybersecurity

The Census Bureau is committed to protecting the security of public data we collect. The trust that our respondents and data providers have in us is central to our core mission. The success of transitioning to a new data-centric ecosystem with the four key initiatives is highly dependent on an effective, efficient and fully integrated approach to cyber security

Perceptions of History

Like other organizations working through modernization challenges, we have attempted similar enterprise-wide initiatives before with nominal success. Ambitious, often disruptive change efforts can test the resolve required for success. The most recent example was the outcome of the Census Enterprise and Data Collection and Processing (CEDCaP) effort. The original vision of CEDCaP was never realized because of a combination of schedule misalignment, inflated specifications and undisciplined management. It is important that the current effort take advantage of existing strengths and follow the lessons learned in previous efforts like CEDCaP. 

Tackling our Challenges

Owning our challenges is the first step to dealing with them. But as the societal need for more timely, high-quality products intensifies, this transformation effort also provides the Census Bureau with a distinct opportunity to break down the barriers between the program directorates and to streamline processes across the entire organization. In this section, we briefly describe how we’re tackling each of the challenges above.

Silos of Excellence: “One Census Bureau”

The Census Bureau director, deputy director and associate directors are providing the focused and unified leadership necessary to guide the significant internal change described in this plan to completion. With the urgency called for by a changing world, they model an enterprise mindset and consistently demonstrate the communication, collaboration and commitment that supports a “One Census Bureau” approach. The deliberate strategy of this approach builds enterprise-wide functionality, competence and confidence from the beginning of the decade by successfully onboarding progressively larger demographic and economic surveys, such as the American Community Survey and the 2027 Economic Census, while continually planning for the next decennial census in 2030.

Management Vision and Commitment

There are four components to this response.

  • Establish a Portfolio Executive

This new office was established in early 2022. The portfolio executive works to broaden the Census Bureau’s existing incentive structures to include short-term, directorate priorities as well as concrete actions to enable and accelerate positive change for the enterprise.

  • Establish a Bureau Leadership Team (BLT)

In early 2022, the Census Bureau established a leadership team that includes the portfolio executive, chief information officer (CIO), and chief financial officer (CFO). The strategic combination of these three positions on the BLT ensures a coordinated, multidisciplinary, unified management approach to address the programmatic, technological and financial considerations of each pillar in the ecosystem.

  • Implement “True North” Guidelines

True North helps align the paths that lead to new and innovative methods and products based on a “One Census Bureau” approach. The principles laid out in True North – a set of principles that works as a compass to guide an organization from current conditions to where it wants to go – provide clear criteria for how the Census Bureau will prioritize work, make decisions and manage complex challenges. Specifically, True North provides:

  • A consistent focus – Ensure all our work contributes to the estimate/product.
  • A cohesive Ecosystem – Build and use the suite of foundational systems to develop a cohesive ecosystem that enables product creation while strengthening and integrating consistent cybersecurity defense and rapid incident response.
  • The pushing of boundaries – Embrace productive discomfort by adopting a modern (i.e., cloud-based) computing approach, plan to move all data to and perform processing in the cloud and develop staff skills and capabilities to use modern computing tools and methods.
  • One Census Bureau – Be a single team rather than separate groups of “customers” and “service providers,” valuing diverse perspectives in creating innovation, collaborating widely and often with internal and external partners to acquire new data sources, create new products, and advance data and computer science.

Further, True North is not merely aspirational. From the top of the organization to all levels of management and staff, the Census Bureau holds firmly to these principles.

  • Streamline Governance

As work on the ecosystem moves forward, the four pillars must be governed as one to ensure seamless management and technical integration. The “One Census Bureau” approach translates here to “One Integrated Governance Approach.”

Proprietary Technology: Migrate to Open-Source Software

We will work to transition legacy systems to open-source where possible (and as quickly as possible) and require open-source for establishing and developing new IT systems.

Technology and Data Science Skills Deficit: Increase Skill Sets Internally/Contract When Required

To provide the expertise necessary for this integration effort, the Census Bureau CIO is employing new strategies to attract staff with appropriate expertise. The CIO will do this by identifying existing federal and contractor staff from across the Census Bureau with relevant skills and abilities and reassigning them to work directly as part of a new Secure Cloud Team (SCT). The CIO will also open additional training opportunities for federal staff and work closely with existing contract Project Managers and Contract Officer Representatives (CORs) to ensure new contract staff possess the necessary skillsets and training to support Census Bureau cloud initiatives.

Rising Costs: Implement New Cost Strategies

Working from examples and lessons learned in other government organizations that have transitioned significant operations to modern computing the BLT will develop a transition funding approach over the next 18 months.

Cybersecurity: Implement New Security Strategies

The public’s trust is in our hands to ensure our ability to produce high-quality federal statistics, so security is spread throughout the initiatives’ ecosystem to form an enhanced set of safeguards for our systems.

  • Data Access and Governance –The initiatives require a streamlined data access and governance approach to ensure that they can be successful in safeguarding our data. For this purpose, we have established a new data governance group to set up the framework to do this.
  • Putting the ‘SEC’ in DevSecOps – Under the guidance of the Office of Information Security (OIS), the security of our continuous integration/continuous delivery (CI/CD) pipeline will be enhanced to take advantage of our modern development architecture in the cloud.

Perceptions of History: Don’t Lose Sight of Recent History and Lessons Learned

When considering recent history, there are fundamental differences in timing and approach. Right now, we’re very early in the 2030 Census cycle and do not anticipate schedule challenges similar to those that upended previous Census Bureau efforts. Further, unlike previous efforts, we are relying largely on systems and solutions that have already succeeded in fulfilling the Census Bureau’s mission. Finally, we are very mindful of past lessons learned.

What Does This All Look Like?

As depicted in the high-level diagram below, the EDL is at the center of the architecture. DICE forms the “inputs” to the lake – data collected from respondents and data ingested from third-party sources. The operational control system in DICE replicates its data to the EDL, ensuring that near-real time data are always available to users within the EDL. Meanwhile, ingested raw data are made available within EDL, allowing users to perform research and to create products with the latest data available from our providers. In our end-state architecture, the Frames program is wholly contained within EDL, including the foundational frames. This allows the frames to be equally accessible to other data and to be combined with collected data in new ways to form new products. While CEDSCI is the external data-discovery platform, we’re investigating uses for internal data-discovery within EDL. In addition, as CEDSCI migrates to the cloud, seamless integration between EDL and CEDSCI will enable easier product dissemination.
 

  

Looking Forward

Transition and System Decommissioning

A key part of the challenge of transitioning to modern computing is the decommissioning of legacy systems. For the purposes of the ecosystem, the BLT will recruit and assign a Legacy System Decommission group to develop a criteria-based decommissioning process integrated with the development, testing and production milestones in the four initiatives.

The 2030 Census

As the Census Bureau’s flagship operation, the decennial census always influences the planning and execution of enterprise-level efforts. This is certainly the case with the new ecosystem. In fact, the excellent performance of many of the applications and processes built within the four pillars of the ecosystem for the 2020 Census greatly strengthened our confidence that they were the best tools to build upon for the future.

Taking the lessons learned from the 2020 Census and maintaining an extensive awareness of 2030 Census planning, this first phase of the ecosystem will use the early years of this decade to address foundational requirements that will support the decennial census as well as the rest of the Census Bureau enterprise. However, the focus for the next two years (FY 2023 and 2024) will be on demographic and economic survey and census capabilities.

Decennial planning milestones will be included in initiative schedules from the beginning. As 2030 planning ramps up through the decade, ecosystem planning will follow suit with increasing decennial milestones.

_____

Michael Thieme is the senior advisor to the deputy director for IT and Operations, U.S. Census Bureau

Top

Back to Header