Clustering Techniques of Leaf-Labelled Trees and Their Applications

Jakub Janusz Koperwas

Abstract

This thesis is devoted to the postprocessing of leaf-labelled trees, i.e. measuring distances, extracting common information and clustering. A comprehensive overview of known techniques is provided and new approaches, based on the concepts of z-restriction, frequent subsplit and edit operations, are introduced. All the presented operations can be used for trees on a free leafset, which is seldom the case for methods existing in the literature. The new methods are described and compared to the state-of-the-art in experiments on synthetic and real-life datasets. Finally, based on the previously introduced concepts, clustering approaches and estimates of quality are considered. The problem of k-best clustering is stated, i.e clustering that forms k groups and at the same time maximizes a clustering quality measure. The possible applications of the presented methods in phylogenetic analysis are also discussed.
Diploma typeDoctor of Philosophy
Author Jakub Janusz Koperwas (FEIT / IN)
Jakub Janusz Koperwas,,
- The Institute of Computer Science
Title in EnglishClustering Techniques of Leaf-Labelled Trees and Their Applications
Languageen angielski
Certifying UnitFaculty of Electronics and Information Technology (FEIT)
Disciplineinformation science / (technology domain) / (technological sciences)
Start date05-09-2007
Defense Date15-06-2010
End date22-06-2010
Supervisor Krzysztof Walczak (FEIT / IN)
Krzysztof Walczak,,
- The Institute of Computer Science

Internal reviewers Jarosław Arabas (FEIT / PE)
Jarosław Arabas,,
- The Institute of Electronic Systems
External reviewers Anna Gambin
Anna Gambin,,
-
Pages127
Keywords in Englishphylogenetics, leaf-labelled trees, data mining, distance measures, clustering, bioinformatics
Abstract in EnglishThis thesis is devoted to the postprocessing of leaf-labelled trees, i.e. measuring distances, extracting common information and clustering. A comprehensive overview of known techniques is provided and new approaches, based on the concepts of z-restriction, frequent subsplit and edit operations, are introduced. All the presented operations can be used for trees on a free leafset, which is seldom the case for methods existing in the literature. The new methods are described and compared to the state-of-the-art in experiments on synthetic and real-life datasets. Finally, based on the previously introduced concepts, clustering approaches and estimates of quality are considered. The problem of k-best clustering is stated, i.e clustering that forms k groups and at the same time maximizes a clustering quality measure. The possible applications of the presented methods in phylogenetic analysis are also discussed.
PKT classification410000
KBN classification28 - informatyka
EU classification8030
Thesis file
koperwas.pdf 916.95 KB
Citation count*1 (2020-09-05)

Get link to the record

Back
Confirmation
Are you sure?