Aggregating local image descriptors into compact codes

IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1704-16. doi: 10.1109/TPAMI.2011.235.

Abstract

This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image data set takes about 250 ms on one processor core.

Publication types

  • Research Support, Non-U.S. Gov't