A general framework for distributed approximate similarity search with arbitrary distances

Day - Time: 26 September 2024, h.11:00
Place: Area della Ricerca CNR di Pisa - Room: C-29
Speakers
  • Elena García-Morato Piñán (Universidad Rey Juan Carlos - Madrid)
Referent

Lucia Vadicamo

Abstract
While many similarity search algorithms are specifically adapted to metric distances,they are unsuitable for alternatives like the cosine distance, which has gained popularity, particularly with embeddings and text mining. To address thisissue, we propose GDASC (General Distributed Approximate Similarity Search with Clustering), a general framework for distributed approximate similarity search that effectively overcomes the limitation of using metric distances and can tackle situations involving cosine similarity or other non-standard similarity measures. The proposed algorithm for building a multilevel index structure,used hereafter in the search process, is prepared to receive any clustering algorithm that generates representatives summarizing the underlying dataset and accepts the chosen distance.