Methodological Pitfalls in Sociolinguistics, Exemplified By Statistical Analyses of Associations Between Stuttering and German Preschoolers' Sociolinguistic Characteristics

Tuesday, 12 July 2016: 15:00
Location: Hörsaal 24 (Main Building)
Oral Presentation
Eugen ZARETSKY, University hospital of Frankfurt/Main, Germany
Benjamin P. LANGE, University of Wuerzburg, Germany
Empirical evidence generated by both uni- and multivariate statistical methods is subject to a certain variability of results depending on the consideration or non-consideration of a number of semi-obligatory rules of statistical analysis. Among other things, it is up to the researcher whether a Bonferroni-, Bonferroni-Holm or some other correction method should be applied to the probability values, whether missing data should be imputed, whether metric data should be z-transformed for a better comparability of different scales, whether exact or asymptotic probability values are reported, with or without respective effect sizes. Any manipulation of the data can result in a considerable variation of results including fluctuations of the p-values and effect sizes, which can be utilized for the so-called “p-hacking”. Also, a very high inconsistency of results of classification trees, regressions, and some other multivariate tests represents a methodological challenge to researchers, especially in retrospective studies where the most relevant factors sometimes should be chosen from a wide range of available variables. These and some other issues are exemplified here on the basis of a sample collected in course of a large language assessment study in the German state of Hesse during the school enrolment examination (N = 746, 40% monolingual Germans, 60% bi/multilingual immigrants; 52% boys, 48% girls; age range 46-99 months, median 70). All children were tested with validated, well-known language tests such as AWST-R, S-ENS, and ETS 4-8. 36 stutterers and 712 non-stutterers were identified by means of questionnaires for parents. A link between stuttering and (a) language skills as well as (b) some sociolinguistic and demographic variables can be represented as very close or non-existent depending on chosen statistical methods (logit-loglinear analysis, linear-with-linear associations, regressions, classification trees, correlations, chi-square, or discriminant analysis) and on manipulations of the data such as missing-data imputation.