Rethinking Spatial Relations Between Local Features for Generic Object Recognition

Download files
Access & Terms of Use
open access
Copyright: Morioka, Nobuyuki
Altmetric
Abstract
Generic object recognition, the problem of recognising categories of objects, has been a long standing challenge in computer vision. While a histogram of quantised local features widely known as the bag-of-visual-words model is shown to perform well due to its simplicity and robustness, it fails to capture spatial relationships between local features which are also important in modelling object categories. Pairs of visual words and higher-order spatial features encode such spatial information into the bag-of-visual-words model by enumerating spatial combinations of visual words. Unfortunately, these combinations grow exponentially large. Feature selection is used to reduce the number of the combinations, but requires additional information such as class labels and boundary boxes. In this thesis, we provide a new perspective on capturing spatial relationships between local features and propose the local pairwise codebook (LPC) to approach the combinatorial problem differently from existing pairing methods, making feature selection unnecessary. We represent each pair of spatially nearby local descriptors as a joint descriptor and apply k-means clustering to these joint descriptors to build a codebook in an unsupervised manner. The construction of the LPC is independent to the number of visual words which effectively avoids the combinatorial explosion. The LPC takes the underlying distribution of pairs of spatially nearby local features into account by minimising the quantisation error of the distribution. We show that the LPC outperforms existing pairing methods and performs competitively against state-of-the-art methods for generic object recognition. Several extensions to the LPC are also proposed in this thesis to demonstrate its wide applicability. First, to increase the discrimination of the codebook, sparse coding is utilised. Second, to improve robustness against object deformation, we describe a scale-invariant object representation based on the codebook by exploiting the nearest neighbour-based distance between local features instead of the pixel-based distance. Third, to accelerate the iterative optimisation of sparse coding, we propose a fast approximation method based on the generalised LASSO. The final outcome of this thesis is an innovative generic object recognition system that outperforms existing methods across challenging datasets by compactly and efficiently modelling discriminative spatial relations between local features.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Morioka, Nobuyuki
Supervisor(s)
Hengst, Bernhard
Mahidadia, Ashesh
Uther, William
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2011
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download whole.pdf 3.47 MB Adobe Portable Document Format
Related dataset(s)