Conference Paper (published)

Evaluation of Jensen-Shannon distance over sparse data

Details

Citation

Connor R, Cardillo FA, Moss R & Rabitti F (2013) Evaluation of Jensen-Shannon distance over sparse data. In: Brisaboa N, Pedreira O & Zezula P (eds.) Similarity Search and Applications: 6th International Conference, SISAP 2013, A Coruña, Spain, October 2-4, 2013, Proceedings. Lecture Notes in Computer Science, 8199. Similarity Search and Applications: 6th International Conference, SISAP 2013, Coruna, Spain, 02.10.2013-04.10.2013. Berlin, Heidelberg: Springer Verlag, pp. 163-168. https://doi.org/10.1007/978-3-642-41062-8_16

Abstract
Jensen-Shannon divergence is a symmetrised, smoothed version of Küllback-Leibler. It has been shown to be the square of a proper distance metric, and has other properties which make it an excellent choice for many high-dimensional spaces in ℝ*. The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible. Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.

Keywords
Sparse data; inverted index; sparse vector; intrinsic dimensionality; large data collection;

Journal
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

StatusPublished
Title of seriesLecture Notes in Computer Science
Number in series8199
Publication date31/12/2013
URLhttp://hdl.handle.net/1893/27707
PublisherSpringer Verlag
Place of publicationBerlin, Heidelberg
ISSN of series0302-9743
ISBN978-3-642-41061-1
ConferenceSimilarity Search and Applications: 6th International Conference, SISAP 2013
Conference locationCoruna, Spain
Dates