Creating Knowledge Base from Automatically Extracted Information
In this article we present a self-learning method for discovering the domain specific knowledge contained in a set of text documents. The method assumes that contents of the input documents have tagged domain-relevant information. The information is tagged with labels from a prespecified set. The method counts the co-occurrences of various sequences of the labels in a sentence and represents them in form of a data structure called a Prefix Label Tree. In order to extract knowledge from a given document, we use a hierarchical clustering method to group the labels contained within the document’s content. In order to calculate similarity of clusters during the clustering process, we also propose a measure called the Relation Possibility (RP).