Possibilities and Challenges of Constructing a Comparative Political Event Database from Multilingual News Sources Using Generative AI

Wednesday, 9 July 2025: 10:00
Location: ASJE032 (Annex of the Faculty of Legal, Economic, and Social Sciences)
Oral Presentation
Takeshi WADA, The University of Tokyo, Japan
Néstor ÁLVARO, Independent Researcher, Spain
Yoshiyuki AOKI, Dokkyo University, Japan
Yoojin KOO, International Christian University, Japan
By leveraging generative AI technologies and multilingual newspapers, our project aims to create a new type of political event database encompassing a variety of political interactions involving state officials, political parties, and civil society actors such as social movements. Over the past decades, event analysis has gained popularity among scholars studying social movements, protests, and civil wars. Traditionally, researchers rely on human coders who review newspapers in one language from one specific country and extract six key elements about events: Who (actor), What (action), Whom (target), When (time), Where (location), and Why (claim)—a framework commonly referred to as the "6Ws."

While this human approach has successfully uncovered historical and geographical trends in political activities that other methods have not, it faces major challenges. These include the high costs of continuously updating data and the difficulty of working with news sources written in other languages.

To address these challenges, our project employs generative AI, which shows promise in extracting the 6Ws as accurately and reliably as human coders but at a fraction of the cost and time. Crucially, AI appears to process information from multilingual sources well. But, is the AI approach truly more effective than the human approach in terms of data extraction accuracy? Is it equally effective in all languages? To date, no comprehensive evaluation has been conducted.

This study fills that gap by comparing the 6Ws extraction capabilities of AI with those of human experts using newspapers in four languages: The New York Times (English) from the United States, La Jornada (Spanish) from Mexico, The Hankyoreh (Korean) from South Korea, and Asahi Shimbun (Japanese) from Japan. By assessing the accuracy in each language, this study will highlight AI's strengths and limitations and contribute to the broader conversation on how AI can revolutionize event analysis globally.