On Cosine and Tanimoto Near Duplicates Search among Vectors with Domains Consisting of Zero, a Positive Number and a Negative Number

Marzena Kryszkiewicz


The cosine and Tanimoto similarity measures are widely applied in information retrieval, text and Web mining, data cleaning, chemistry and bio-informatics for finding similar objects, their clustering and classification. Recently, a few very efficient methods were offered to deal with the problem of lossless determination of such objects, especially in large and very high-dimensional data sets. They typically relate to objects that can be represented by (weighted) binary vectors. In this paper, we offer methods suitable for searching vectors with domains consisting of zero, a positive number and a negative number; that is, being a generalization of weighted binary vectors. Our results are not worse than their existing analogs offered for (weighted) binary vectors.
Author Marzena Kryszkiewicz (FEIT / IN)
Marzena Kryszkiewicz,,
- The Institute of Computer Science
Book Larsen Henrik Legind , Martin-Bautista Maria J. , Vila María Amparo , Andreasen Troels , Christiansen Henning (eds.): Flexible Query Answering Systems. 10th International Conference, FQAS 2013, Proceedings, Lecture Notes In Computer Science, vol. 8132, 2013, Heidelberg New York Dordrecht London, Springer Berlin Heidelberg, ISBN 978-3-642-40768-0, [978-3-642-40769-7], 693 p., DOI:10.1007/978-3-642-40769-7
front-matter1.pdf / 194.45 KB / No licence information
ProjectDevelopment of new methods and algorithms in the following areas: computer graphics, artificial intelligence, and information systems, and distributed systems . Project leader: Rybiński Henryk, , Phone: +48 22 234 7731, start date 29-05-2012, planned end date 31-12-2012, end date 30-11-2013, II/2012/DS/1, Completed
WEiTI Działalność statutowa
Languageen angielski
Score (nominal)0
Score sourcejournalList
ScoreMinisterial score = 0.0, 01-02-2020, BookChapterSeriesAndMatConfByIndicator
Ministerial score (2013-2016) = 0.0, 01-02-2020, BookChapterSeriesAndMatConfByIndicator
Publication indicators GS Citations = 6.0
Citation count*6 (2020-09-06)
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?