The Census Bureau has proposed a major revision to the data that it will release from the 2020 decennial census. I think that the proposal would harm demographic research and affect policies. The Bureau is accepting feedback through October 22 (details at the link above). I would encourage you to look through the proposal and provide feedback on the proposal. I have copied my response below for those who may be interested.

A huge shout-out to the NHGIS team who notified users of these important changes and who have been tireless advocates for ensuring continued access to useful data.

Dear Dr. Jarmin:

I write to strongly oppose the changes proposed for the geographic release of 2020 Census data. The proposed changes will substantially limit the ability for researchers outside of the government to create population projections and will hamper the substantial progress we have made toward understanding how neighborhoods affect well-being. The proposed changes could catastrophically limit research on neighborhood racial and ethnic change and geographic inequality. Confidential Research Data Centers are not a feasible solution with the current structure and staffing levels and would increase disparities across institutions. I explain all of these concerns in more detail below, including specific research questions that would be impossible to research under the proposed release guidance.

The Bureau proposes such sweeping changes, it is impossible to address them all. I will focus my comments here on racial and ethnic disparities research that would be prevented by the proposed changes since that is my area of expertise. One particularly egregious example is the elimination of single-year age by sex by race (tables PCT12A-O in 2010). Population projections by race and ethnicity depend on detailed data by age and sex. Entirely eliminating this series will increase errors in population projections and prevent researchers from verifying population projections used for planning purposes at the federal, state, and local levels.

While the Census Bureau proposes releasing data by selected age categories (Proposed 2020 Tables P12A-P12AH), these data do not provide sufficient granularity for projections. In particular, they do not include age of residents under 1 year of age, a key denominator for properly conducting demographic projections, especially across race and ethnic groups given the large racial disparities in infant mortality in the United States.1 It will also limit the ability of researchers to conduct research on the consequences of COVID-19 in the coming years. Given the steep age gradient of severe cases and deaths related to COVID-19, the five-year age bins in the proposed tables will lead to noisy estimates and stymie research that will be vital for national pandemic and epidemic preparedness for the future.

In my own research, I have relied on these one-year age categories to project populations across neighborhoods in the United States. My work shows that by not accounting for births and deaths, estimates of neighborhood mobility can be wrong, sometimes drastically so. My work applies, among other areas, to school districts who want to develop catchments that eliminate or reduce racial segregation in schools.

I appreciate the necessity to avoid disclosures, especially for groups in the minority (e.g., a small number of Black residents in a nearly all-White tract or a small number of White residents in a nearly all-Black tracts). But such a blunt instrument does not, in my view, seem necessary. The Census Bureau may follow the lead of the Centers for Disease Control and Prevention and use thresholds for disclosure. In its online WONDER database, the CDC avoids disclosures because the database reports “do not present or publish death or birth counts of 9 or fewer or rates based on counts of nine or fewer.”2 The value of the threshold may differ from that used by the CDC in WONDER, but the same principle may be applied.

In addition to entirely eliminating these tables, the Census Bureau also proposes reducing the geographic granularity of data for . To be quite honest, this is shocking to me. The Census Bureau has, since 1990 fully tracted the country to provide data on neighborhood conditions. Other than states, tracts are the only geographic unit defined consistently across the entire country. Counties vary in their scope and creation under state law (and even further back to land titles in former colonies that preceded the existence of the United States) as do places given different requirements to form municipalities across states. Losing data at the tract level means that we lose these consistent definitions and eliminates the ability to use data from the decennial census to measure neighborhood inequality. Counties vary in size by orders of magnitude across the country, ranging from only 64 residents (Loving County, Texas) to over 10 million residents (Los Angeles County, California). Under no conceivable system are those two units equivalent for any type of analysis to support research or program policies.

More importantly, tract data have been a boon to measure and address economic, educational, and health disparities. Without tract-level data that measures the conditions of neighborhoods, we cannot measure the profound disparities across neighborhoods. Measuring disparities at these levels have been a priority across the federal government. One example comes from the National Institute of Aging’s priorities for 2020-2025 that include exploring “new ways to improve safety in the home and community through studies of ergonomics and the built environment.”3(p16) Preventing research on the age, race, and family status of people at any level smaller than a county will hinder such efforts and, even worse, increase inequality in data access. The more racially and ethnically diverse populations of large counties will receive much less community-level data than smaller and whiter counties. And, again, while I am focusing on racial and ethnic disparities given my expertise, I want to emphasize that gender and economic research will be similarly affected.

I believe that limiting the geographic specificity of tables has to do with efforts to protect privacy. The Census Bureau has proposed Research Data Centers as a solution to protect privacy, but the RDC network is already buckling under the weight of requests and cannot handle the tidal wave of requests that will ensue.4 It currently takes weeks to months for research to pass through disclosure review. Further restricting public release data to the county level will increase the number of researchers needing to use the Research Data Centers. Without plans to increase staffing to an order of magnitude larger than it is now, disclosure review will choke off any timely peer review. A peer review cycle that takes, on average two months per round of peer review that undergoes two rounds of review, each requiring at least a month for each round of disclosure review will take a paper six months to be published assuming authors turn the revisions around immediately. Of course review times usually take longer, the increased load on Research Data Centers will increase the time needed for disclosure review, and the researcher needs time to revise the paper in accordance with peer review. Census Bureau research policy DS001a prioritizes peer review, but not releasing neighborhood-level data forecloses the possibility of timely, peer-reviewed research using the decennial census data.5

Finally, relying on the Research Data Centers data will increase inequality across institutions. Only researchers affiliated with institutions that have paid for access to Research Data Centers will be able to access the data. This removes the possibility for independent researchers as well as researchers from institutions without large research budgets using Census data for their analysis. The proposed tables release schedule will renege on the democratization of data access has been a notable accomplishment of the Census Bureau.

In sum, I urge the Census Bureau to reconsider the proposed release schedule. The proposed schedule will prevent independent validation of demographic analyses, stymie important work on community disparities by race and ethnicity, create unequal accuracy levels across geographic regions, slow the progress of peer review research using the decennial census, and exacerbate institutional inequality.

Sincerely,

Michael Bader
Associate Professor of Sociology and Policy
Associate Director, Metropolitan Policy Center
American University

References

1.
Ely DM, Driscoll AK. Infant Mortality in the United States, 2018: Data From the Period Linked Birth/Infant Death File. National Vital Statistics Reports. 2020;69(7). Accessed October 8, 2021. https://www.cdc.gov/nchs/data/nvsr/nvsr69/NVSR-69-7-508.pdf
2.
Centers for Disease Control and Prevention. Data Use Restrictions. Published February 10, 2020. Accessed October 12, 2021. https://wonder.cdc.gov/DataUse.html#
3.
National Institute on Aging. Strategic Directions for Research, 2020-2025.; 2020. https://www.nia.nih.gov/about/aging-strategic-directions-research
4.
Abowd JM. Research Data Centers, Reproducible Science, and Confidentiality Protection: The Role of the 21st Century Statistical Agency. Presented at the: Summer DemSem; June 5, 2017; Madison, Wisconsin. https://www2.census.gov/cac/sac/meetings/2017-09/role-statistical-agency.pdf
5.
Data Stewardship Executive Policy Committee, US Census Bureau. DS001a - Administrative Data Acquisition, Access, and Use Policy.; 2016. https://www2.census.gov/foia/ds_policies/ds001.pdf

Pingbacks

Pingbacks are closed.

Comments

Comments are closed.