473.2
Understanding Social Media. Use of Machine Learning (ML) in Qualitative Data Analysis.

Thursday, 19 July 2018: 10:45
Location: 717B (MTCC SOUTH BUILDING)
Oral Presentation
Marek TROSZYNSKI, Collegium Civitas, Poland
Since the beginning of the 21st century we have seen rapid development of computer-mediated communication, especially the so-called social media. A collection of texts written by traditionally understood "users" (User Generated Content - UGC) appeared in public space. Researchers faced a major problem - how to analyze texts created by non-professionalists who are characterized by the diversity of language, styles of expression, conventions, sociolinguistics, dialect or colloquialisms.

The purpose of this article is to present the process of automating coding of texts from social media. The implementation of this process allows for quantitative treatment of qualitative methods: analysis on the corpora of hundreds thousands of texts based on their meaning. The process is possible through algorithms of machine learning (ML).

The example of the hate speech designation project in texts from Polish online forums is presented. The first step is to gather the largest database of texts using key words. This part was carried out using commercial tools to collect the texts.

The key issue is the precise of conceptualization and operationalization of individual research categories. This allows for preparing specific instructions and conducting the training code unit. As a result we get higher rates of inter-coder agreement. Marked texts will be used as training data for automated categorization methods based on ML algorithms.

Then we describe the course of machine coding. This article also seeks to establish problems associated with automatic coding of hate speech and propose solutions. In summary, we point the factors that are crucial to the research process that uses machine learning.