External Nonparametric Memory in Deep Learning

Download files
Access & Terms of Use
open access
Copyright: Long, Alexander
Altmetric
Abstract
Deep Neural Networks are limited in their ability to access and manipulate external knowledge after training. This capability is desirable; information access can be localized for interpretability, the external information itself may be modified improving editability, and external systems can be used for retrieval and storage, freeing up internal parameters that would otherwise be required to memorize knowledge. This dissertation presents three such approaches that augment deep neural networks with various forms external memory, achieving state-of-the-art results across multiple benchmarks and sub-fields. First, we examine the limits of retrieval alone in Sample-Efficient Reinforcement Learning (RL) setting. We propose a method, NAIT, that is purely memory based, but is able to achieve performance comparable with the best neural models on the ATARI100k benchmark. Because NAIT does not make use of parametric function approximation, and instead approximates only locally, it is extremely computationally efficient, reducing the run-time for a full sweep over ATARI100k from days to minutes. NAIT provides a strong counterpoint to the prevailing notion that retrieval based lazy learning approaches are too slow to be practically useful in RL. Next, we combine the promising non-parametric retrieval approach of NAIT with large image and text encoders for the task of Long-Tail Visual Recognition. This method, Retrieval Augmented Classification (RAC), achieves state-of-the art performance on the highly competitive long-tail datasets iNaturalist2018 and Places365-LT. This work is one of the first systems to effectively combine parametric and non-parametric approaches in Computer Vision. Most promisingly, we observe RAC's retrieval component achieves its highest per-class accuracies on sparse, infrequent classes, indicating non-parametric memory is an effective mechanism to model the `long-tail' of world knowledge. Finally, we move beyond standard single-step retrieval and investigate multi-step retrieval over graphs of sentences for the task of Reading Comprehension. We first propose a mechanism to effectively construct such graphs from collections of documents, and then learn a general traversal policy over such graphs, conditioned on the query. We demonstrate the combination of this retriever with existing models both consistently boosts accuracy and reduces training time by 2-3x.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2022
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 5.9 MB Adobe Portable Document Format
Related dataset(s)