Abstract
The interactive recommendation aims to accommodate and learn from dynamic interactions between items and users
to achieve responsiveness and accuracy in recommendation systems. Reinforcement learning is inherently
advantageous for coping with dynamic/interactive environments and thus has attracted increasing attention in
interactive recommendation research. However, most existing works tend to learn stationary user interests, while
neglecting that they are dynamic in nature.
The dissertation starts with the introduction of the recommendation system and its applications. This is then followed
by the detailed literature review which covers three main related areas: Sequence-aware Recommendation, Interactive
Recommendation and the Knowledge-aware Recommendation System. The dissertation also reviews the
reinforcement learning based applications in recommendation systems and discuss the advantages and shortcomings.
After that, this dissertation reports a general problem statement about the interactive recommendation system and
the identified challenges to be tackled, including user dynamic interest modeling, and computational cost of
reinforcement learning optimization, and performance degradation for reinforcement learning based recommendation
systems.
In particular, we propose a set of techniques and models for the improved interactive recommendation via
reinforcement learning. We propose a new model for learning a distributed interaction embedding, which can capture
user's dynamic interest in a compact and expressive manner. Inspired by the recent advance in Graph Convolutional
Network and the knowledge-aware recommendation, we design a Knowledge-Guided deep Reinforcement learning
(KGRL) model to harness the advantages of both reinforcement learning and knowledge graphs for the interactive
recommendation. This model is implemented within the actor-critic network framework. It maintains a local knowledge
network to guide decision-making process during the training phase and employs the attention mechanism to discover
long-term semantics between items. To reduce the computational cost of reinforcement learning, we take a further
step to design an enhanced optimization strategy by narrowing down the space of updating steps and turning the
reward function.
We have conducted comprehensive experiments in a simulated online environment for the three proposed methods,
which show consistent improved performance of our models against the baselines and state-of-art methods in the
literature. Finally, this dissertation discusses the future work and potential further improvement for interactive
recommendation systems.
Declaration relating to disposition of project