Random Binary Search Trees for approximate nearest neighbour search in binary spaces

Michał Paweł Komorowski , Tomasz Trzciński

Abstract

Approximate nearest neighbour (ANN) search is one of the most important problems in numerous computer science applications, including data mining, machine learning and computer vision. In this paper, we address the problem of ANN for high-dimensional binary vectors and we propose a simple yet powerful search method that is based on Random Binary Search Trees (RBST). Our method is validated on a dataset of 1.25M binary local feature descriptors obtained from a real-life image-based localisation system provided by Google as a part of Project Tango (now known as ARCore). The results of an extensive evaluation of our method performed on this dataset against the state-of-the-art ANN algorithms, such as Locality Sensitive Hashing, Uniform LSH and Multi-probe LSH, show the superiority of our method in terms of retrieval precision with a performance boost of over 20%.
Author Michał Paweł Komorowski (FEIT / IN)
Michał Paweł Komorowski,,
- The Institute of Computer Science
, Tomasz Trzciński (FEIT / IN)
Tomasz Trzciński,,
- The Institute of Computer Science
Journal seriesApplied Soft Computing, ISSN 1568-4946, (A 40 pkt)
Issue year2019
Pages1-21
Publication size in sheets1
Keywords in EnglishApproximate nearest neighbour searchBinary vectorsRandom Binary Search TreesLocality sensitive hashing
ASJC Classification1712 Software
DOIDOI:10.1016/j.asoc.2019.03.031
URL https://doi.org/10.1016/j.asoc.2019.03.031
Languageen angielski
File
1-s2.0-S1568494619301553-main (1).pdf 1.34 MB
Score (nominal)40
ScoreMinisterial score = 40.0, 10-07-2019, ArticleFromJournal
Publication indicators WoS Citations = 0; Scopus Citations = 0; Scopus SNIP (Source Normalised Impact per Paper): 2016 = 2.037; WoS Impact Factor: 2017 = 3.907 (2) - 2017=4.004 (5)
Citation count*2 (2019-07-11)
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back