ML-Based Annotation Outperforms Human Coder for Annotation Tasks: Not so Fast. an Analysis of Race Annotation for Youtube Using ML-Based, Standardized Human Coded and Qualitative Data
This contribution is based in a larger case study on algorithmically introduced racial inequalities among German content creators on YouTube and has the goal to critically examine the use of GSC in the comparisons between machine and human annotations. We ask the following question: What challenges does the use of GSC pose when comparing different annotation methods with regards to sensitive categories such as race? We proceeded in three steps: (1) We created our own GSC with the help of human annotators using a standardized classification survey (Liang et al. 2022). (2) We then invited the annotators to a focus groups discussion on the challenges and possibilities of classifying race for online profiles. (3) We used the GSC to compare three ML-based annotation applications (Skybiometry, Kairos, ChatGPT) to a different group of human annotators.
First results show that the creation of a GSC for race annotations contains significant ambiguities resulting in critical ethical challenges for its use. When compared to the GSC ML-based tools are unable to meet the quality of the GSC, posing further questions with regards to the reproduction of racial biases in automated annotation.