Methods of Information Extraction in Job Advertisements

Friday, 20 July 2018: 08:30
Oral Presentation
Betül GÜNTÜRK-KUHL, Federal Institute for Vocational Education and Training, Germany
Philipp MARTIN, FederalInstitute for Vocational Education and Training (BIBB), Germany
We search for developments in qualification requirements, skills or occupational profiles by the analysis of job advertisements. Our current dataset contains all raw texts of job ads, which have been registered at the job pool of the Federal Employment Agency since 2011. Therefore, we have a broad database with nearly two and a half million advertisements for extensive research to yield useful insight on the structure and changes in aggregate labor demand by occupations and changes in the relevance of specific qualifications or competences.

We used methods of information extraction with the aid of rule-based machine learning. In this procedure first specific expressions or sentences are sorted into specified categories. This procedure is constructed to first classify the raw text into four blocks: a) Self-definition of the company, b) Definition of the task, c) Definition of the required competences of the potential applicant, d) Other, as for example: contact information.

After that we formulate specific rules for extract information, for example, that every expression after “You are familiar in using [XY], XY is a work tool. After that we have to confirm if this term is a work tool or not and the application is able to extract new tools.

In the session, I will present our workflow and explain the classification and information extraction method. Additionally, I can describe the distribution in our taxonomy and show the results of the analyses of the advertisements: in which industries and occupations (digitalized) work tools can be found and how to find differences in the sizes of companies or qualifications.

In the near future, we will analyze all online advertisements in Germany. Therefore, this is a nice example for an advantage of the analysis of mass data for the Sociology.