Deep learning based stereo matching on a small dataset

Download files
Access & Terms of Use
open access
Copyright: Wu, Rongcheng
Deep learning (DL) has been used in many computer vision tasks including stereo matching. However, DL is data hungry, and a large number of highly accurate real-world training images for stereo matching is too expensive to acquire in practice. The majority of studies rely on large simulated datasets during training, which inevitably results in domain shift problems that are commonly compensated by fine-tuning. This work proposes a recursive 3D convolutional neural network (CNN) to improve the accuracy of DL based stereo matching that is suitable for real-world scenarios with a small set of available images, without having to use a large simulated dataset and without fine-tuning. In addition, we propose a novel scale-invariant feature transform (SIFT) based adaptive window for matching cost computation that is a crucial step in the stereo matching pipeline to enhance accuracy. Extensive end-to-end comparative experiments demonstrate the superiority of the proposed recursive 3D CNN and SIFT based adaptive windows. Our work achieves effective generalization corroborated by training solely on the indoor Middlebury Stereo 2014 dataset and validating on outdoor KITTI 2012 and KITTI 2015 datasets. As a comparison, our bad-4.0-error is 24.2 that is on par with the AANet (CVPR2020) method according to the publicly evaluated report from the Middlebury Stereo Evaluation Benchmark.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
Resource Type
Degree Type
Masters Thesis
UNSW Faculty
download public version.pdf 13.86 MB Adobe Portable Document Format
Related dataset(s)