Person Name Disambiguation for Building University Knowledge Base
Piotr Andruszkiewicz , Szymon Szepietowski
AbstractIn this paper we propose a new algorithm for person name disambiguation within authors of scientific publications. The algorithm is effective, elastic, and tailored to a scientific knowledge base. Besides the common properties of publication; namely, title, venue, author and co-authors names, it also exploits references. One of the reasons is that we decided to enrich the University Knowledge Base with connections between publications, not only references represented by a reference (i.e. author’s name, title, etc.). Our algorithm utilises the unsupervised approach which does not require creating a training set, which is time and resources consuming. However, we want to leverage additional information available from crowd sourcing or authorised users which confirms authorship and citation relations between papers. By utilising this information default parameters of the unsupervised algorithm can be optimised for a given case by means of a genetic algorithm in order to increase the accuracy. The proposed method can be applied for three tasks: assigning a publication to a specific researcher, indicating that a new author is yet unknown to the database and clustering a set of publications into clusters that contain papers of one researcher. Validation results confirm high accuracy of the new algorithm and its usefulness in the process of populating a scientific knowledge base.
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.