Bounds on Lengths of Real Valued Vectors Similar with Regard to the Tanimoto Similarity

Marzena Kryszkiewicz


The Tanimoto similarity measure finds numerous applications in chemistry, bio-informatics, information retrieval and text mining. A typical task in these applications is finding most similar vectors. The task is very time consuming in the case of very large data sets. Thus methods that allow for efficient restriction of the number of vectors that have a chance to be sufficiently similar to a given vector are of high importance. To this end, recently, we have derived bounds on lengths of vectors similar with respect to the Tanimoto similarity. In this paper, we recall those results and derive new bounds on lengths of real valued vectors that have a chance to be Tanimoto similar to a given vector in a required degree. Finally, we compare the previous and current results and illustrate their usefulness.
Author Marzena Kryszkiewicz (FEIT / IN)
- The Institute of Computer Science
Book Selamat Ali, Nguyen Ngoc Thanh, Haron Habibollah (eds.): Intelligent Information and Database Systems, Proceedings, Part I, Lecture Notes In Computer Science, vol. 7802, 2013, Heidelberg New York Dordrecht London, Springer Berlin Heidelberg, ISBN 978-3-642-36545-4, 520 p., DOI:10.1007/978-3-642-36546-1
Keywords in Englishthe Tanimoto similarity chemical substructure discovery text mining information filtering
ProjectEstablishment of the universal, open, hosting and communication, repository platform for network resources of knowledge to be used by science, education and open knowledge society. Project leader: Kryszkiewicz Marzena, , Phone: +48 22 234 7701, start date 16-08-2010, planned end date 16-08-2013, end date 31-10-2013, WEiTI/2012/PS/1, Completed
