Hierarchical reinforcement learning in adversarial environments

Kwok, Hing-Wah

doi:10.26190/unsworks/14209

Hierarchical reinforcement learning in adversarial environments

Download files

Access & Terms of Use

open access
Copyright: Kwok, Hing-Wah

CC BY-NC-ND 3.0

Abstract

It is known that one of the downfalls of reinforcement learning is the amount of time required to learn an optimal policy. This especially holds true for environments with large state spaces or environments with multiple agents. It is also known that standard Q-Learning develops a deterministic policy, and so in games where a stochastic policy is required (such as rock, paper, scissors) a Q-Learner opponent can be defeated without too much difficulty once the learning has ceased. Initially we investigated the impact that the MAXQ hierarchical reinforcement learning algorithm had in an adversarial environment. We found that it was difficult to conduct state space abstraction, especially when an unpredictable or co-evolving opponent was involved. We noticed that to keep the domains zero-sum, discounted learning was required. We had also found that a speed increase could be obtained through the use of hierarchy in the adversarial environment. We then investigated the ability to obtain similar learning speed increases to adversarial reinforcement learning through the use of this hierarchical methodology. Applying the hierarchical decomposition to Bowling's Win or Learn Fast (WoLF) algorithm we were able to maintain the accelerated learning rate whilst simultaneously retaining the stochastic elements of the WoLF algorithm. We made an assessment on the impact of the adversarial component of the hierarchy at both the higher and lower tiers of the hierarchical tree. Finally, we introduce the idea of pivot points. A pivot point is the last possible time you can wait before having to make a decision and thus revealing your strategy to the opponent. This results in maximising confusion for the opponent. Through the use of these pivot points, which could only have been discovered through the use of hierarchy, we were able to perform improved state-space abstraction since no decision needed to be made, in regards to the opponent, until this point was reached.

Persistent link to this record

http://hdl.handle.net/1959.4/43424

DOI

https://doi.org/10.26190/unsworks/14209

Author(s)

Kwok, Hing-Wah

Publication Year

2009

Resource Type

Thesis

Degree Type

Masters Thesis

UNSW Faculty

Files

Kwok-014184435.pdf

4.63 MB

Adobe Portable Document Format

View full record Show statistics

Library

Hierarchical reinforcement learning in adversarial environments

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)