Bounds on Lengths of Real Valued Vectors Similar with Regard to the Tanimoto Similarity
The Tanimoto similarity measure finds numerous applications in chemistry, bio-informatics, information retrieval and text mining. A typical task in these applications is finding most similar vectors. The task is very time consuming in the case of very large data sets. Thus methods that allow for efficient restriction of the number of vectors that have a chance to be sufficiently similar to a given vector are of high importance. To this end, recently, we have derived bounds on lengths of vectors similar with respect to the Tanimoto similarity. In this paper, we recall those results and derive new bounds on lengths of real valued vectors that have a chance to be Tanimoto similar to a given vector in a required degree. Finally, we compare the previous and current results and illustrate their usefulness.