438.1
Methods of Information Extraction in Job Advertisements
We used methods of information extraction with the aid of rule-based machine learning. In this procedure first specific expressions or sentences are sorted into specified categories. This procedure is constructed to first classify the raw text into four blocks: a) Self-definition of the company, b) Definition of the task, c) Definition of the required competences of the potential applicant, d) Other, as for example: contact information.
After that we formulate specific rules for extract information, for example, that every expression after “You are familiar in using [XY], XY is a work tool. After that we have to confirm if this term is a work tool or not and the application is able to extract new tools.
In the session, I will present our workflow and explain the classification and information extraction method. Additionally, I can describe the distribution in our taxonomy and show the results of the analyses of the advertisements: in which industries and occupations (digitalized) work tools can be found and how to find differences in the sizes of companies or qualifications.
In the near future, we will analyze all online advertisements in Germany. Therefore, this is a nice example for an advantage of the analysis of mass data for the Sociology.