Apprenticeship Bootstrapping: Multi-Skill Reinforcement Learning for Autonomous Unmanned Aerial Vehicles

Download files
Access & Terms of Use
open access
Copyright: Nguyen, Hung
Altmetric
Abstract
Apprenticeship Learning (AL) uses data collected from humans on tasks to design machine-learning algorithms to imitate the skills used by humans. Such a powerful approach to developing autonomous machines comes with challenges stemming from its reliance on the existence of expert humans who can perform the task and their willingness to be available. In this thesis, Apprenticeship Bootstrapping (ABS) is proposed as a new learning algorithm that relies on humans who are experts on less-complex tasks to aggregate the skills required for more complex ones. ABS has been validated using a ground and aerial coordination task (GACT), where an unmanned aerial vehicle (UAV) needs to autonomously follow a group of unmanned ground vehicles (UGVs) while maintaining the group within the field of view of the UAV's camera. ABS decomposes the complex control task for the UAV operator into simpler sub-tasks which an operator is available to perform and to collect demonstrations from. Two proposed learning approaches, apprenticeship bootstrapping via deep learning and inverse reinforcement learning (ABS-DL and ABS-IRL, respectively), are used to address this challenge. In ABS-DL, deep learning, which has shown dramatic advances in terms of effectively work on raw data, is attractive for learning directly from demonstrations of sub-tasks. However, it may not be effective for acquiring a desirable policy, as it relies heavily on the quality of demonstrations. The quality of demonstrations may not be good because of errors in collecting expert demonstrations caused by the limitations of the techniques, or poor demonstrations caused by the expert. ABS-IRL addresses these issues by recovering the expert reward function from demonstrations on the sub-tasks. In ABS-IRL, during every learning episode, the learner acts learned policy and re-optimizes its learned model while performing the task. This learning approach allows the leaner to avoid learning incorrect or poor demonstrations. Both ABS-DL and ABS-IRL were tested on GACT and results confirmed that ABS-IRL could bootstrap the more complex skill from lower less-complex ones.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Nguyen, Hung
Supervisor(s)
Abbass, Hussein
Garratt, Mathew
Bui, Lam
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2018
Resource Type
Thesis
Degree Type
Masters Thesis
UNSW Faculty
Files
download public version.pdf 2.95 MB Adobe Portable Document Format
Related dataset(s)