Squeezing the Data? the Effect of Data Handling Practices in Stratification Research
In case of longitudinal household panel data, the problem of missing data becomes even more complex. Longitudinal household data offers more information to address the problems accompanying INR and UNR because earlier data points can be used to extrapolate missing items and other household members’ data can serve as proxy in case of UNR. This advantage, however, could easily become a pitfall if assumptions about the underlying process that generated the missing data are not only wrong but also bias estimators. The opposite strategy of simply ignoring partial observations (i.e. list-wise deletion) might also bias results by curtailing the representativity of the results.
Based on the large body of literature on imputation techniques, we study the effect of various strategies of handling missing information in panel data. We compare results of stratification analyses using social origin as a predictor variable across several specifications obtained by applying the “persistence” approach (i.e. carrying forward or backward older information), the chained-regression imputation approach based on the same time point and, additionally, on prior information, using proxy information from other household members, and employing retrospective versus prospective information. Results are compared to those obtained by restricting the analysis sample to observed values. As litmus test for the effect of data handling practices, we employ three different applications from educational, social mobility and labor market research.