Speech-Based Emotion Recognition: Linguistic and Saliency-Based Systems

Wataraka Gamage, Kalani

doi:10.26190/unsworks/20726

Speech-Based Emotion Recognition: Linguistic and Saliency-Based Systems

Download files

Access & Terms of Use

open access
Copyright: Wataraka Gamage, Kalani

CC BY-NC-ND 3.0

Abstract

Speech-based emotion recognition is a research field of growing interest, which aims to identify human emotions based on speech. The main contributions of this thesis revolve around the use of verbal and non-verbal vocalisation cues for speech-based emotion recognition, which is complementary to popularly used acoustic features for both emotion classification and continuous emotion prediction tasks. This thesis initially explores the supra-segmental feature representations generated by the vectorisation of the Mel-frequency cepstral coefficient frame level feature distribution models for emotion classification, which is an alternative to the default acoustic supra-segmental features. Next, the thesis focuses on the development of approaches for incorporating the emotional saliency and pronunciation of verbal cues (lexical features) for emotion classification. Apart from lexical features, non-verbal vocal events such as laughter, sighs, and expressions such as “grrr!”, “oh!”, and disfluency patterns including filled pauses such as “hmm” are identified within the linguistic feature domain. These elements of speech are instrumental in portraying both voluntary and involuntary emotions in human communication. Despite this, they have not been used for automatic emotion recognition in a completely automatic manner, and their effect on emotion recognition has not yet been adequately analysed. This thesis proposes and develops several models to utilise emotionally salient linguistic cues, including non-verbal gestures and disfluencies, implicitly for emotion classification and continuous emotion prediction tasks. This is achieved without the need for tagged and time aligned non-verbal vocalisation labels. The proposed novel approaches allow emotion recognition systems to utilise linguistic information independent of manual transcripts or automatic speech recognition. Inspired by the analysis of the influence of non-verbal vocalisations for continuous emotion prediction, as well as emotion psychology concepts related to the symbolic reference function of such expressions, this thesis proposes a novel view of continuous emotion prediction leading to the development of a transparent framework for continuous emotion prediction. This framework is modelled as a time-invariant filter array for continuous emotion prediction, and distinct from the pointwise regression mapping taken by traditional approaches. All proposed approaches are extensively evaluated on state-of-the-art emotion databases.

Persistent link to this record

http://hdl.handle.net/1959.4/60426

DOI

https://doi.org/10.26190/unsworks/20726

Author(s)

Wataraka Gamage, Kalani

Supervisor(s)

Ambikairajah, Eliathamby

sethu, vidhyasaharan

Publication Year

2018

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

5.21 MB

Adobe Portable Document Format

View full record Show statistics

Library

Speech-Based Emotion Recognition: Linguistic and Saliency-Based Systems

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)