Multilevel Data What to Do? Comparing Random Intercept and Slope Models, Cluster-Robust Standard Errors, and Two-Step Approaches Using Monte-Carlo Simulations

Thursday, July 17, 2014: 3:30 PM
Room: 416
Oral Presentation
Johannes GIESECKE , Institute of Social Sciences, Humboldt University Berlin, Germany
Jan HEISIG , Research Unit "Skill Formation and Labor Markets", WZB Berlin Social Science Center, Berlin, Germany
Merlin SCHAEFFER , Migration Integration Transnationalization, WZB Berlin Social Science Center, Berlin, Germany
Social scientists generally rely on three broad modelling strategies to test hypotheses about contextual effects: random intercept and slope (often simply referred to as “multilevel”) models, pooled OLS with cluster-robust standard errors, and two-step approaches. Econometric textbooks tell us that while random intercept and slope models are the most efficient estimator, two-step approaches offer robustness in exchange for inefficiency, and cluster-robust standard errors are situated somewhere in between. But how do these trade-offs play out in actual research settings? To address this question, we go beyond previous Monte-Carlo studies by focusing on more realistic set ups with complex data-generating processes. The leading scenario that we investigate is cross-national comparisons, which are characterized by small numbers of contexts, many observations per context and high complexity in terms of marked differences over the contexts.

In particular, we focus on four types of complexity. First, we investigate whether the different approaches are robust to the violation of equality assumptions. In particular, we examine the case where the correlations between level one variables vary across contexts. Second, we show the impact of specifying “simplistic models” that ignore context specific heterogeneity. How well do the different approaches handle unspecified (random) slopes that vary over level two units? Third, we explore the consequences of Normal and Gamma distributed errors at both levels one and two. Finally, we alter the number of level two units, as any simulation study on hierarchical data should.

We focus on linear models with continuous outcomes and on standard set ups as they are typically implemented in applied research papers. However, we also plan to investigate whether and when more refined versions of the three modelling approaches such as OLS with bootstrapping or multilevel SEM improve their performance.