Innovative methods for the analysis of complex and non-standard data

dc.contributor.advisor Sisson, Scott en_US
dc.contributor.advisor Beranger, Boris en_US Whitaker, Thomas en_US 2022-03-23T12:17:09Z 2022-03-23T12:17:09Z 2019 en_US
dc.description.abstract Symbolic Data Analysis (SDA) is an emerging branch of statistics that addresses some of the issues associated with the analysis of non-standard (symbolic) datasets, such as intervals, histograms and lists. Datasets of this nature are useful in preserving the privacy of individual observations, and also for reducing the size and dimension of big datasets. This leads to significant computational benefits if an appropriate symbolic analysis can be derived. The rapidly increasing computational power that is becoming more and more readily available has also led to increasingly common non-standard datasets. Data arriving in a non-standard form often possesses internal variation not seen in pointwise classical observations. This means that existing classical methods of analysis are unsuitable if results are desired that possess an underlying classical interpretation. Currently, most developed SDA methods focus on an exploratory analysis of the data, with the subsequent results only useful at the symbolic level, and not directly comparable to the complete analysis of the true latent underlying dataset unless some specific assumptions concerning the uniformity of the data within each symbol are met. A common existing symbolic methodology is to perform a classical analysis of features of the non-standard data, such as interval end-points. In this thesis methods of analysis for non-standard data are developed that are interpretable at the underlying classical level. Further, if enough information is retained during the aggregation process, the methods derived for the analyses of non-standard datasets obtain comparable results to the complete classical analysis of the underlying latent dataset. As a result, big datasets that pose computational problems can be analysed using the proposed symbolic methodologies instead of the classical analyses, at a cheaper computational cost. These methods are highly flexible, meaning they don't rely on a uniformity assumption within each symbol, and can be applied to a range of symbolic data. The utility of each symbolic method is demonstrated via simulation studies illustrating the convergence of the results towards the complete analysis with increasing information retention during the aggregation process. Further, each derived method has then been applied to a real dataset in order to demonstrate their real-life application. en_US
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri en_US
dc.subject.other Symbolic Data Analysis en_US
dc.subject.other Big Data en_US
dc.title Innovative methods for the analysis of complex and non-standard data en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Whitaker, Thomas
dspace.entity.type Publication en_US
unsw.relation.faculty Science
unsw.relation.originalPublicationAffiliation Whitaker, Thomas, Mathematics & Statistics, Faculty of Science, UNSW en_US
unsw.relation.originalPublicationAffiliation Sisson, Scott, Mathematics & Statistics, Faculty of Science, UNSW en_US
unsw.relation.originalPublicationAffiliation Beranger, Boris, Mathematics & Statistics, Faculty of Science, UNSW en_US School of Mathematics & Statistics *
unsw.thesis.degreetype PhD Doctorate en_US
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
public version.pdf
2.7 MB
Resource type