Studia Informatica Pomerania
Autor: Joanna Matusiak 59
Strony: 59-73
pdf pełen tekst

Extraction and Aggregation of the Job Market Web Sites Content

The article presents overview and practical exploration of the data extraction scraping tool for internet web sites content. As the exemplary analytical data source author has chosen job market portals offering the advertisements of new vacancies.
Outcome results can be used in further detailed analysis as the input data of the complex analytical systems based on the data exploration, displaying search results according to the chosen criteria. Extraction data tool let the user store output results and exchange the data with other systems through XML, XSL and CSV files. Web scraping mechanism built into the tool offers graphical, action-based, user interactive processes. Data extraction is based on the web macro recordings as well as data and pages patterns generation.
Keywords: data extraction, data aggregation, job portals offers, job offers analysis