Assessing the Role of Generative AI in Protest Event Analysis: A Comparison with Manually Created Data

Monday, 7 July 2025: 15:12
Location: FSE036 (Faculty of Education Sciences (FSE))
Oral Presentation
Takeshi WADA, The University of Tokyo, Japan
Emilia CUADROS, Universidad de Diego Portales, Chile
Néstor ÁLVARO, Independent Researcher, Spain
Nicolás SOMMA, Associate Professor of Sociology, Pontificia Universidad Católica de Chile, Chile
This study examines the feasibility of applying generative AI technologies, such as OpenAI's ChatGPT, to event analysis and data development. Event analysis is a popular method for sociologists and political scientists studying popular protests and social movements. Traditionally, researchers develop data in three steps: (1) gather newspaper data, (2) extract core components of protest events (who, did what, to whom, when, where, and why), and (3) assign theoretical/analytical codes to these components for analysis (e.g., assign a code “president” to an expression of the whom component “Joe Biden”). This process is typically manual, involving research assistants who read newspaper articles, identify relevant "codes" from a “codebook” (a list of theoretical/analytical codes), and input the codes into databases. However, this approach is costly, difficult to keep updated, and, oftentimes, undesirable because it is these assistants, not the researchers, who decide which theoretical codes to be used in practice.

To address these challenges, this study focuses on the third step—assigning theoretical/analytical codes—and assesses AI's performance in this task, which is complex for both humans and AI. We then compare AI-generated protest event data with manually developed data. For this purpose, we use protest event dataset provided by the Observatory of Conflicts of the Centre for Social Conflict and Cohesion Studies (COES) in Santiago, Chile. It includes 23,398 protest events in Chile from 2009 to 2019. The digitization of original newspaper articles by the COES permits us to apply generative AI processing to exactly the same set of articles. Therefore the COES data offers a rare opportunity to compare human and AI-generated data. By analyzing historical trends and patterns across both datasets, this paper explores the benefits and limitations of AI-based data methods compared to traditional manual approaches.