Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach

Maciej Bartoszuk , Anna Cena , Marek Gągolewski


The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, de Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.
Author Maciej Bartoszuk (FMIS / DAICM)
Maciej Bartoszuk,,
- Department of Artificial Intelligence and Computational Methods
, Anna Cena (FMIS / DIE) - [Instytut Badań Systemowych Polskiej Akademii Nauk]
Anna Cena,,
- Department of Integral Equations
- Instytut Badań Systemowych Polskiej Akademii Nauk
, Marek Gągolewski (FMIS / DIE) - [Systems Research Institute (IBS PAN) [Polish Academy of Sciences (PAN)]]
Marek Gągolewski,,
- Department of Integral Equations
- Instytucie Badań Systemowych
Publication size in sheets0.55
Book Torra Vicenç, Narukawa Yasuo, Navarro-arribas Guillermo, Yañez Cristina (eds.): MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, (MDAI 2016), Lecture Notes in Artificial Intelligence, vol. 9880, 2016, SPRINGER-VERLAG BERLIN, ISBN 978-3-319-45655-3, [978-3-319-45656-0]
Keywords in EnglishHierarchical clustering, Aggregation, Centroid, Gini-index, Genie algorithm
Abstract in PolishW artykule zaproponowaliśmy uogólnienie odmienności najbliższego centroidu, stosowanego w algorytmach hierarchicznej analizy skupień. Po pierwsze, zamiast centroidów w przestrzeni R^d, algorytm bazuje na dowolnych funkcjach agregujących minimalizujących funkcję straty opartą na mierze odległości. Po drugie, podczas złączania stosowana jest korekta znana z algorytmu *Genie*, oparta na indeksie nierówności ekonomicznej. Wyniki empiryczne wskazują na znaczącą poprawe jakości działania w stosunku do wyjściowego algorytmu.
URL http://link.springer.com/chapter/10.1007%2F978-3-319-45656-0_16
Languageen angielski
Score (nominal)15
Score sourceconferenceIndex
ScoreMinisterial score = 15.0, 26-01-2020, BookChapterSeriesAndMatConfByConferenceseries
Ministerial score (2013-2016) = 15.0, 26-01-2020, BookChapterSeriesAndMatConfByConferenceseries
Publication indicators Scopus Citations = 3; WoS Citations = 3
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?