Improving automatic speaker verification using front-end and back-end diversity

Download files
Access & Terms of Use
open access
Copyright: Kua, Jia Min Karen
Altmetric
Abstract
Technologies that exploit biometrics can potentially be applied to the identification and verification of individuals for controlling access to secured areas or materials. Among these technologies, automatic speaker verification systems are of growing interest, as they are the least invasive and they allow recognition via any type of communication network over long distances. The overall goal of this thesis is to improve the performance of automatic speaker verification systems by investigating novel features and classification methods that complement current state-of-the-art systems. At the feature level, novel log-compressed least squares group delay and spectral centroid features are proposed. The log-compression and least squares regularisation are shown to reduce the dynamic range of modified group delay features and outperform other existing group delay extraction methods. The proposed spectral centroid features provide a better characterisation of spectral energy distribution and experimental results show that the detailed spectral characterisation significantly improves performance. A diverse front-end involving multiple features would improve both phonetic (acoustic) and speaker modelling. In this regard, the relative contributions of the acoustic and speaker modelling ‘stages’ on the speaker recognition performance across different features are investigated. The investigation conducted through the use of clustering comparison measures suggests that front-end diversity, and hence improved performance from fused systems, can be achieved purely through different ‘partitioning’ of the acoustic space. Built on the finding, a novel universal background model (UBM) data/utterance selection algorithm that increases stability of the acoustic modelling is proposed. Finally, at the classification level, the use of the sparse representation classification (SRC) using Gaussian mixture model supervectors (GMMSRC) is proposed and is found to perform comparably to Gaussian mixture model-support vector machines (GMM-SVM). However, GMM-SRC results in a slower verification process. In order to increase the computation efficiency, the large dimensional supervectors are replaced with speaker factors resulting in the joint factor analysis-sparse representation classification (JFA-SRC). In addition, a novel dictionary composition technique to further improve the computation efficiency is developed. Results demonstrate that the refined dictionary provide comparable performance over the use of the complete dataset and generalises well to the evaluation on other databases. Notably, a detailed comparison of the proposed JFA-SRC across various state-of-the-art classifiers on the NIST 2010 databases showed that the proposed JFA-SRC achieved the best Minimum Detection Cost Function (minDCF), highlighting the usefulness of the SRC-based systems.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Kua, Jia Min Karen
Supervisor(s)
Ambikairajah, Eliathamby
Epps, Julien
Choi, Eric
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2012
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download whole.pdf 1.26 MB Adobe Portable Document Format
Related dataset(s)