Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach
Maciej Bartoszuk , Anna Cena , Marek Gągolewski
Abstract
The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, de Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.Author | |
Pages | 191-202 |
Publication size in sheets | 0.55 |
Book | Torra Vicenç, Narukawa Yasuo, Navarro-arribas Guillermo, Yañez Cristina (eds.): MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, (MDAI 2016), Lecture Notes in Artificial Intelligence, vol. 9880, 2016, SPRINGER-VERLAG BERLIN, ISBN 978-3-319-45655-3, [978-3-319-45656-0] |
Keywords in English | Hierarchical clustering, Aggregation, Centroid, Gini-index, Genie algorithm |
Abstract in Polish | W artykule zaproponowaliśmy uogólnienie odmienności najbliższego centroidu, stosowanego w algorytmach hierarchicznej analizy skupień. Po pierwsze, zamiast centroidów w przestrzeni R^d, algorytm bazuje na dowolnych funkcjach agregujących minimalizujących funkcję straty opartą na mierze odległości. Po drugie, podczas złączania stosowana jest korekta znana z algorytmu *Genie*, oparta na indeksie nierówności ekonomicznej. Wyniki empiryczne wskazują na znaczącą poprawe jakości działania w stosunku do wyjściowego algorytmu. |
DOI | DOI:10.1007/978-3-319-45656-0_16 |
URL | http://link.springer.com/chapter/10.1007%2F978-3-319-45656-0_16 |
Language | en angielski |
Score (nominal) | 15 |
Score | = 15.0, 04-09-2019, BookChapterSeriesAndMatConfByConferenceseries = 15.0, 04-09-2019, BookChapterSeriesAndMatConfByConferenceseries |
Publication indicators | = 3; = 3 |
Citation count* |
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back