A Large Scale, High Quality U.S. Occupational Database: Results from Merged IRS and ACS Write-Ins
A Large Scale, High Quality U.S. Occupational Database: Results from Merged IRS and ACS Write-Ins
Thursday, 10 July 2025
Location: SJES007 (Faculty of Legal, Economic, and Social Sciences (JES))
Distributed Paper
The measurement of worker occupations is a key component in understanding long-term changes in the U.S. social and economic structure. However, practical problems in measuring occupation at scale, such as non-universe data coverage, noisy survey responses, and difficulty working with raw text strings, continue to make using occupation data a roadblock to statistical agencies and researchers alike. In this project, we develop and validate an occupation database that addresses all these problems by combining the U.S. Census Bureau and Internal Revenue Service (IRS)’s best available measures of occupations to create a large, high quality, and linkable database of individual worker occupations. The Census Bureau only collects occupation in the American Community Survey (ACS) and Annual Socioeconomic Supplement (ASEC). Sample size is limited. Conversely, the Internal Revenue Service (IRS) solicits near-universe write-in occupation information on Form 1040, but does not use this information for statistical reporting of any kind. Compared with the ACS occupation write-ins, IRS write-ins are significantly shorter texts, unstandardized, and have minimal validation protocols. In this study, we look to overcome these various source limitations by combining Census’s ACS data with IRS’s tax return data to develop a large volume of accurate occupational records.
To begin, we match IRS data available for all electronically filed tax returns in year 2019 to ACS data. Here, we present the similarities and differences in individual occupation responses between the two sources with the goal of understanding the strengths and weaknesses of each. We use fuzzy matching (token set ratios) to compare write-in responses between the 1040 and the ACS. We investigate the quality of the ACS/IRS write-in match as a function of age, gender, earnings, and other covariates. Additionally, we investigate the relative sensitivity of each write-in to year-over-year occupational transitions. Current results are in disclosure review.