Innovative methods for the analysis of complex and non-standard data

Whitaker, Thomas

doi:10.26190/unsworks/21750

Innovative methods for the analysis of complex and non-standard data

Download files

Access & Terms of Use

open access
Copyright: Whitaker, Thomas

CC BY-NC-ND 3.0

Abstract

Symbolic Data Analysis (SDA) is an emerging branch of statistics that addresses some of the issues associated with the analysis of non-standard (symbolic) datasets, such as intervals, histograms and lists. Datasets of this nature are useful in preserving the privacy of individual observations, and also for reducing the size and dimension of big datasets. This leads to significant computational benefits if an appropriate symbolic analysis can be derived. The rapidly increasing computational power that is becoming more and more readily available has also led to increasingly common non-standard datasets. Data arriving in a non-standard form often possesses internal variation not seen in pointwise classical observations. This means that existing classical methods of analysis are unsuitable if results are desired that possess an underlying classical interpretation. Currently, most developed SDA methods focus on an exploratory analysis of the data, with the subsequent results only useful at the symbolic level, and not directly comparable to the complete analysis of the true latent underlying dataset unless some specific assumptions concerning the uniformity of the data within each symbol are met. A common existing symbolic methodology is to perform a classical analysis of features of the non-standard data, such as interval end-points. In this thesis methods of analysis for non-standard data are developed that are interpretable at the underlying classical level. Further, if enough information is retained during the aggregation process, the methods derived for the analyses of non-standard datasets obtain comparable results to the complete classical analysis of the underlying latent dataset. As a result, big datasets that pose computational problems can be analysed using the proposed symbolic methodologies instead of the classical analyses, at a cheaper computational cost. These methods are highly flexible, meaning they don't rely on a uniformity assumption within each symbol, and can be applied to a range of symbolic data. The utility of each symbolic method is demonstrated via simulation studies illustrating the convergence of the results towards the complete analysis with increasing information retention during the aggregation process. Further, each derived method has then been applied to a real dataset in order to demonstrate their real-life application.

Persistent link to this record

http://hdl.handle.net/1959.4/65573

DOI

https://doi.org/10.26190/unsworks/21750

Author(s)

Whitaker, Thomas

Supervisor(s)

Sisson, Scott

Beranger, Boris

Publication Year

2019

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

2.7 MB

Adobe Portable Document Format

View full record Show statistics

Library

Innovative methods for the analysis of complex and non-standard data

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)