Person Name Disambiguation for Building University Knowledge Base

Piotr Andruszkiewicz , Szymon Szepietowski


In this paper we propose a new algorithm for person name disambiguation within authors of scientific publications. The algorithm is effective, elastic, and tailored to a scientific knowledge base. Besides the common properties of publication; namely, title, venue, author and co-authors names, it also exploits references. One of the reasons is that we decided to enrich the University Knowledge Base with connections between publications, not only references represented by a reference (i.e. author’s name, title, etc.). Our algorithm utilises the unsupervised approach which does not require creating a training set, which is time and resources consuming. However, we want to leverage additional information available from crowd sourcing or authorised users which confirms authorship and citation relations between papers. By utilising this information default parameters of the unsupervised algorithm can be optimised for a given case by means of a genetic algorithm in order to increase the accuracy. The proposed method can be applied for three tasks: assigning a publication to a specific researcher, indicating that a new author is yet unknown to the database and clustering a set of publications into clusters that contain papers of one researcher. Validation results confirm high accuracy of the new algorithm and its usefulness in the process of populating a scientific knowledge base.
Author Piotr Andruszkiewicz II
Piotr Andruszkiewicz,,
- The Institute of Computer Science
, Szymon Szepietowski II
Szymon Szepietowski,,
- The Institute of Computer Science
Publication size in sheets0.5
Book Nguyen Ngoc Thanh, Trawiński Bogdan, Fujita Hamido, Hong Tzung-pei (eds.): Intelligent Information and Database Systems, 8th Asian Conference, ACIIDS 2016, Proceedings, Part I, Lecture Notes in Artificial Intelligence, vol. 9621, 2016, Springer Berlin Heidelberg, ISBN 978-3-662-49380-9, [978-3-662-49381-6], 815 p., DOI:10.1007/978-3-662-49381-6
Keywords in EnglishPerson name disambiguation – Unsupervised approach – Genetic algorithm – Scientific knowledge base
projectNew perspectives in dialogue: a model of deliberation and IT tools for social inclusion in decision-making. Project leader: Andruszkiewicz Piotr, , Phone: +48 22 234 7715, start date 16-07-2014, end date 31-08-2016, II/2014/NCBiR/1, Completed
WEiTI Projects financed by NCRD [Projekty finansowane przez NCBiR (NCBR)]
Development of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Rybiński Henryk, , Phone: +48 22 234 7731, start date 18-05-2015, end date 30-11-2016, II/2015/DS/1, Completed
WEiTI Działalność statutowa
Languageen angielski
disamb-2.pdf 247.43 KB
Score (nominal)15
ScoreMinisterial score = 15.0, 27-03-2017, BookChapterSeriesAndMatConf
Ministerial score (2013-2016) = 15.0, 27-03-2017, BookChapterSeriesAndMatConf
Citation count*0
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.