ML-Based Annotation Outperforms Human Coder for Annotation Tasks: Not so Fast. an Analysis of Race Annotation for Youtube Using ML-Based, Standardized Human Coded and Qualitative Data

Wednesday, 9 July 2025: 00:00
Location: FSE036 (Faculty of Education Sciences (FSE))
Oral Presentation
Claudia BUDER, Université Paris 1 Panthéon-Sorbonne, France
Chiara OSARIO KRAUTER, University of Potsdam, Germany
Aaron PHILIPP, University of Potsdam, Germany
Sarah WEISSMANN ANNA, University of Potsdam, Germany
Roland VERWIEBE, University of Potsdam, Germany
The advent of AI-based tools offers new opportunities and challenges to sociological methods. Recent studies have pointed to the capacities of these machine learning (ML) models to take over repetitive tasks such as the classification of data (Belal et al. 2023; Whang et al. 2023). Comparing different AI applications and human annotators against a gold standard corpus (GSC) (Wissler et al., 2024), some studies have found that AI can surpass humans in various annotation tasks (Aldeen et al., 2023, Gilardi et al., 2023), although, this is not the case when it comes to more complex tasks (Labruna et al. 2023).

This contribution is based in a larger case study on algorithmically introduced racial inequalities among German content creators on YouTube and has the goal to critically examine the use of GSC in the comparisons between machine and human annotations. We ask the following question: What challenges does the use of GSC pose when comparing different annotation methods with regards to sensitive categories such as race? We proceeded in three steps: (1) We created our own GSC with the help of human annotators using a standardized classification survey (Liang et al. 2022). (2) We then invited the annotators to a focus groups discussion on the challenges and possibilities of classifying race for online profiles. (3) We used the GSC to compare three ML-based annotation applications (Skybiometry, Kairos, ChatGPT) to a different group of human annotators.

First results show that the creation of a GSC for race annotations contains significant ambiguities resulting in critical ethical challenges for its use. When compared to the GSC ML-based tools are unable to meet the quality of the GSC, posing further questions with regards to the reproduction of racial biases in automated annotation.