Person Name Disambiguation for Building University Knowledge Base

Piotr Andruszkiewicz , Szymon Szepietowski


In this paper we propose a new algorithm for person name disambiguation within authors of scientific publications. The algorithm is effective, elastic, and tailored to a scientific knowledge base. Besides the common properties of publication; namely, title, venue, author and co-authors names, it also exploits references. One of the reasons is that we decided to enrich the University Knowledge Base with connections between publications, not only references represented by a reference (i.e. author’s name, title, etc.). Our algorithm utilises the unsupervised approach which does not require creating a training set, which is time and resources consuming. However, we want to leverage additional information available from crowd sourcing or authorised users which confirms authorship and citation relations between papers. By utilising this information default parameters of the unsupervised algorithm can be optimised for a given case by means of a genetic algorithm in order to increase the accuracy. The proposed method can be applied for three tasks: assigning a publication to a specific researcher, indicating that a new author is yet unknown to the database and clustering a set of publications into clusters that contain papers of one researcher. Validation results confirm high accuracy of the new algorithm and its usefulness in the process of populating a scientific knowledge base.
Book Nguyen Ngoc Thanh, Trawiński Bogdan, Fujita Hamido, Hong Tzung-pei (eds.): Intelligent Information and Database Systems, 8th Asian Conference, ACIIDS 2016, Proceedings, Part I, Lecture Notes in Artificial Intelligence, vol. 9621, 2016, Springer Berlin Heidelberg, ISBN 978-3-662-49380-9, [978-3-662-49381-6], 815 p., DOI:10.1007/978-3-662-49381-6
Keywords in EnglishPerson name disambiguation – Unsupervised approach – Genetic algorithm – Scientific knowledge base
