Speech and music discrimination using short-time features

Download files
Access & Terms of Use
open access
Copyright: Mubarak, Omer Mohsin
Altmetric
Abstract
This thesis addresses the problem of classifying an audio stream as either speech or music, an issue which is beginning to receive increasing attention due to its wide range of applications. Various techniques have been presented in last decade to discriminate between speech and music. However, their accuracy is still not sufficient since music can refer to a very broad class of signals due to the large number of musical instruments found in audio data. Performance can also be further compromised in noisy conditions, which are unavoidable in some practical situations. This thesis presents an analysis of feature extraction techniques and classifiers currently being used, followed by the proposal and evaluation of new features for improved classification. These include two novel cepstral features, delta cepstral energy and power spectrum deviation, along with amplitude and frequency modulation features. The modified group delay feature, initially proposed for speech recognition, is also investigated for speech and music discrimination. Experiments were performed using different sets of features, compared among themselves and with conventional MFCCs using error rate criteria and Detection Error Trade-off curves. It is shown that the proposed cepstral and modulation features result in an increase in the accuracy of the conventional MFCC based system. However, the modified group delay feature which has been shown to improve accuracy for speech classification problems, does not contribute much to the problem of speech and music discrimination. Among the ones presented here the optimum feature configuration, both modulation features with MFCC, resulted in overall error rate of 6.57% as compared to 7.43% for MFCC alone.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Mubarak, Omer Mohsin
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2006
Resource Type
Thesis
Degree Type
Masters Thesis
UNSW Faculty
Files
download whole.pdf 887.37 KB Adobe Portable Document Format
Related dataset(s)