Quantitative Characteristics of Quality Social Texts

Thursday, 19 July 2018: 11:15
Oral Presentation
Alexandre IVANEC, St.-Petersburg State University, Russia
Elena YAGUNOVA, St.-Petersburg State University, Russian Federation
Danil BLIZNUK, St.-Petersburg State University, Russian Federation
Information has a leading role in the age of linguistic technology. Confidence in correct understanding information by other people is extremely important for everybody now - from primary school teacher to CEOs. There`s a property of text information, that enables us to assess its difficulty for the different groups of people. It`s readability. According to G. McLaughlin, readability is “the degree to which a given class of people find certain reading matter compelling and comprehensible”. Different methods of predicting the difficulty level of the text were included in readability formulas, that were wide used in different researches. Now there is an ability to improve these formulas and to configure them for definite language. The research shows that different formulas are more relevant for various genres of text. Every method has to be tested on different text clusters and the results should be comparable to understand.

Some formulas were compared by implementing them on the same text clusters. Three types of corpora (and after processing – clusters) were used - fiction, newspapers and scientific articles. Some of clusters contained the similar texts, translated to different languages (English, Polish, Russian). The purpose of the research is to determine the field of application not only of each formula, but of different types of methods it used. The weak point was in lack of precision in scientific text clusters assessment, because of its extra complicated syntactic structure.

During readability formulas analysis, we got attendant information about difficulty various texts in different languages that enable to conclude the complexity of the different languages (and genres) and to estimate different translations of the similar texts. Cloze-tests is one of the basic evaluation variant with informants. Thus, our paper concerns the basic parts of Natural Language Processing and Cognitive Science.