A new probabilistic model for IR

Day - Time: 07 May 2013, h.11:00
Place: Area della Ricerca CNR di Pisa - Room: C-29
  • Richard Connor (University of Strathclyde - Scotland, United Kingdom)

Fausto Rabitti


Over the years a number of competing models have been introduced attempting to solve the central IR problem of ranking documents given textual queries. These models, however, tend to require the inclusion of heuristics and the estimation of collection-specific parameter values in order to be effective. We define a new model that we do not believe has yet been explored. In terms of the categorisation of IR models, it is a probabilistic model and has no term inter-dependencies, thus allowing calculation from inverted indices. It is based upon a simple core hypothesis, directly calculating a ranking score in terms of probability theory and does not require the estimation of any parameters. We show initial tests in comparison with a number of standard baseline IR models, and show that the new model is at least credible, often outperforming the Language Model with Dirichlet smoothing.

Our contributions are twofold: first, we believe the new model is worthy of further investigation and that its performance could be improved significantly; and secondly, we believe the observation that the Jensen-Shannon metric can be evaluated over inverted indices in a sparse space is also more generally applicable.