External Nonparametric Memory in Deep Learning

Long, Alexander

doi:10.26190/unsworks/24303

External Nonparametric Memory in Deep Learning

Download files

Access & Terms of Use

open access
Copyright: Long, Alexander

CC BY 4.0

Abstract

Deep Neural Networks are limited in their ability to access and manipulate external knowledge after training. This capability is desirable; information access can be localized for interpretability, the external information itself may be modified improving editability, and external systems can be used for retrieval and storage, freeing up internal parameters that would otherwise be required to memorize knowledge. This dissertation presents three such approaches that augment deep neural networks with various forms external memory, achieving state-of-the-art results across multiple benchmarks and sub-fields. First, we examine the limits of retrieval alone in Sample-Efficient Reinforcement Learning (RL) setting. We propose a method, NAIT, that is purely memory based, but is able to achieve performance comparable with the best neural models on the ATARI100k benchmark. Because NAIT does not make use of parametric function approximation, and instead approximates only locally, it is extremely computationally efficient, reducing the run-time for a full sweep over ATARI100k from days to minutes. NAIT provides a strong counterpoint to the prevailing notion that retrieval based lazy learning approaches are too slow to be practically useful in RL. Next, we combine the promising non-parametric retrieval approach of NAIT with large image and text encoders for the task of Long-Tail Visual Recognition. This method, Retrieval Augmented Classification (RAC), achieves state-of-the art performance on the highly competitive long-tail datasets iNaturalist2018 and Places365-LT. This work is one of the first systems to effectively combine parametric and non-parametric approaches in Computer Vision. Most promisingly, we observe RAC's retrieval component achieves its highest per-class accuracies on sparse, infrequent classes, indicating non-parametric memory is an effective mechanism to model the `long-tail' of world knowledge. Finally, we move beyond standard single-step retrieval and investigate multi-step retrieval over graphs of sentences for the task of Reading Comprehension. We first propose a mechanism to effectively construct such graphs from collections of documents, and then learn a general traversal policy over such graphs, conditioned on the query. We demonstrate the combination of this retriever with existing models both consistently boosts accuracy and reduces training time by 2-3x.

Publication Year

2022

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

5.9 MB

Adobe Portable Document Format

View full record Show statistics

Library

External Nonparametric Memory in Deep Learning

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)