Data Acquisition and Information Extraction for Scientific Knowledge Base Building

Piotr Andruszkiewicz , Henryk Rybiński


Here we present the process of data acquisition and information extraction for building a comprehensive and accurate scientific knowledge base including conferences, publications and scientists. We use two kinds of data sources. Firstly we gather structured and reliable, but incomprehensive and not always up-to-date data sources such as digital libraries. We enrich information extracted from those sources with unstructured data obtained from the Internet by filtering websites using SVM classifier to identify potentially useful web pages. There are two potential sources of errors in the process of information enrichment. The first is the unstructured data origin and another is lack of accuracy of the machine learning methods used for data acquisition and information extraction. We address both problems by proposing a new information extraction method as well as by using crowdsourcing to correct information. Our methods are currently used in a scientific platform; namely, Omega-Psir university knowledge base, containing list of researchers, publications, events, etc.

Author Piotr Andruszkiewicz (FEIT / IN)
Piotr Andruszkiewicz,,
- The Institute of Computer Science
, Henryk Rybiński (FEIT / IN)
Henryk Rybiński,,
- The Institute of Computer Science
Publication size in sheets0.5
Book O’Conner Lisa (eds.): 12th IEEE International Conference (ICSC). Proceedings, 2018, Institute of Electrical and Electronics Engineers, ISBN 978-1-5386-4409-6, [978-1-5386-4408-9], 419 p.
projectDevelopment of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław, , Phone: +48 22 234 7432, start date 01-08-2018, planned end date 30-09-2019, II/2018/DS/1, Implemented
WEiTI Działalność statutowa
Languageen angielski
Score (nominal)15
ScoreMinisterial score = 15.0, 21-05-2019, ChapterFromConference
Ministerial score (2013-2016) = 15.0, BookChapterMatConf
Publication indicators Scopus Citations = 0; WoS Citations = 0
Citation count*1 (2019-06-05)
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.