Data Acquisition and Information Extraction for Scientific Knowledge Base Building
Piotr Andruszkiewicz , Henryk Rybiński
Here we present the process of data acquisition and information extraction for building a comprehensive and accurate scientific knowledge base including conferences, publications and scientists. We use two kinds of data sources. Firstly we gather structured and reliable, but incomprehensive and not always up-to-date data sources such as digital libraries. We enrich information extracted from those sources with unstructured data obtained from the Internet by filtering websites using SVM classifier to identify potentially useful web pages. There are two potential sources of errors in the process of information enrichment. The first is the unstructured data origin and another is lack of accuracy of the machine learning methods used for data acquisition and information extraction. We address both problems by proposing a new information extraction method as well as by using crowdsourcing to correct information. Our methods are currently used in a scientific platform; namely, Omega-Psir university knowledge base, containing list of researchers, publications, events, etc.
|Publication size in sheets||0.5|
|Book||O’Conner Lisa (eds.): 12th IEEE International Conference (ICSC). Proceedings, 2018, Institute of Electrical and Electronics Engineers, ISBN 978-1-5386-4409-6, [978-1-5386-4408-9], 419 p.|
|project||Development of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław,
, Phone: +48 22 234 7432, start date 01-08-2018, planned end date 30-09-2019, II/2018/DS/1, Implemented
|Score|| = 15.0, 21-05-2019, ChapterFromConference|
= 15.0, BookChapterMatConf
|Publication indicators||= 0; = 0|
|Citation count*||1 (2019-06-05)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.