Hierarchical reinforcement learning in adversarial environments

Download files
Access & Terms of Use
open access
Copyright: Kwok, Hing-Wah
Altmetric
Abstract
It is known that one of the downfalls of reinforcement learning is the amount of time required to learn an optimal policy. This especially holds true for environments with large state spaces or environments with multiple agents. It is also known that standard Q-Learning develops a deterministic policy, and so in games where a stochastic policy is required (such as rock, paper, scissors) a Q-Learner opponent can be defeated without too much difficulty once the learning has ceased. Initially we investigated the impact that the MAXQ hierarchical reinforcement learning algorithm had in an adversarial environment. We found that it was difficult to conduct state space abstraction, especially when an unpredictable or co-evolving opponent was involved. We noticed that to keep the domains zero-sum, discounted learning was required. We had also found that a speed increase could be obtained through the use of hierarchy in the adversarial environment. We then investigated the ability to obtain similar learning speed increases to adversarial reinforcement learning through the use of this hierarchical methodology. Applying the hierarchical decomposition to Bowling's Win or Learn Fast (WoLF) algorithm we were able to maintain the accelerated learning rate whilst simultaneously retaining the stochastic elements of the WoLF algorithm. We made an assessment on the impact of the adversarial component of the hierarchy at both the higher and lower tiers of the hierarchical tree. Finally, we introduce the idea of pivot points. A pivot point is the last possible time you can wait before having to make a decision and thus revealing your strategy to the opponent. This results in maximising confusion for the opponent. Through the use of these pivot points, which could only have been discovered through the use of hierarchy, we were able to perform improved state-space abstraction since no decision needed to be made, in regards to the opponent, until this point was reached.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Kwok, Hing-Wah
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2009
Resource Type
Thesis
Degree Type
Masters Thesis
UNSW Faculty
Files
download Kwok-014184435.pdf 4.63 MB Adobe Portable Document Format
Related dataset(s)