Speech based Continuous Emotion Prediction: An investigation of Speaker Variability and Emotion Uncertainty

Download files
Access & Terms of Use
open access
Copyright: Dang, Ting
Altmetric
Abstract
Understanding and describing human emotional state is important for many applications such as interactive human-computer interface design and clinical diagnosis tools. Speech based emotion prediction is generally viewed as a regression problem, where speech waveforms are labelled in terms of affective attributes such as arousal and valence, with numerical values indicating the short-term emotion intensity. Current research on continuous emotion prediction has primarily focused on improving the backend, developing novel features or improving feature selection techniques. However, emotion expressions or perceptions are in general heterogeneous across individuals, depending on a wide range of factors, such as cultural background and speaker’s gender. The impact of these sources of variations on the continuous emotion prediction systems has not been fully explored yet and is the focus of this thesis. Speaker variability, i.e., differences in emotion expression among speakers, has been shown to be one of the most confounding factors in categorical emotion recognition system, but there is limited literature that analyses the effect on continuous emotion prediction systems. In this thesis, a probabilistic framework is proposed to quantify speaker variability in continuous emotion systems in both the feature and the model domains. Furthermore, three compensation techniques for speaker variability are developed and in-depth analyses in both the feature and model spaces are carried out. Another confounding factor is the inter-rater variability, i.e., difference in emotion perception among raters, which is ignored in current approaches by taking the average rating across multiple raters as the ‘true’ representation of the emotion states. However, differences in perception among raters suggest that prediction certainty varies with time. A novel approach for the prediction of emotion uncertainty is proposed and implemented by including the inter-rater variability as a representation of the uncertainty information in a probabilistic model. In addition, Kalman filters are incorporated into this framework to take into account the temporal dependencies of the emotion uncertainty, as well as providing the flexibility to relax the Gaussianity assumption on the emotion distribution that reflects the uncertainty. The proposed frameworks and methods have been extensively evaluated on multiple state-of-the-art databases and the results have demonstrated the potential of the proposed solutions.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Dang, Ting
Supervisor(s)
Sethu, Vidhyasaharan
Ambikairaja, Eliathamby
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2018
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 4.18 MB Adobe Portable Document Format
Related dataset(s)