Possibilities and Challenges of Constructing a Comparative Political Event Database from Multilingual News Sources Using Generative AI
While this human approach has successfully uncovered historical and geographical trends in political activities that other methods have not, it faces major challenges. These include the high costs of continuously updating data and the difficulty of working with news sources written in other languages.
To address these challenges, our project employs generative AI, which shows promise in extracting the 6Ws as accurately and reliably as human coders but at a fraction of the cost and time. Crucially, AI appears to process information from multilingual sources well. But, is the AI approach truly more effective than the human approach in terms of data extraction accuracy? Is it equally effective in all languages? To date, no comprehensive evaluation has been conducted.
This study fills that gap by comparing the 6Ws extraction capabilities of AI with those of human experts using newspapers in four languages: The New York Times (English) from the United States, La Jornada (Spanish) from Mexico, The Hankyoreh (Korean) from South Korea, and Asahi Shimbun (Japanese) from Japan. By assessing the accuracy in each language, this study will highlight AI's strengths and limitations and contribute to the broader conversation on how AI can revolutionize event analysis globally.