Novel likelihood-based inference for symbolic data analysis

Download files
Access & Terms of Use
open access
Copyright: Lin, Huan
Symbolic data analysis (SDA) is a relatively new branch in statistics. It has emerged from the need to consider data that contain information which cannot be satisfactorily represented and modelled within classical data models. SDA is a new paradigm which extends the classical data models to take into account more complete and complex information and serves as an alternative solution to tackle "big data" problems by reducing and summarising data of massive size to "classes" of interest. SDA organises multiple unstructured data tables to a single coherent data table containing symbolic-valued variables, often recorded in the forms of intervals or histograms. There has been a considerable amount of research in this area with many of the existing methods developed based on a uniformity within a symbol assumption. It has shown that this uniformity assumption is unrealistic in solving real-world problems. Likelihood functions are fundamental in statistical inference and to date; two likelihood-based methods for SDA have been introduced. However, while these methods have shown to be beneficial, there are a number of current methodological weaknesses that limit their potential to become an invaluable tool in a modern statistician's toolkit. To this end, we propose new models for performing likelihood-based inference for SDA. Our approach overcomes the need to assume uniformity within a symbol (intervals or histogram bins) assumption which is conventional in SDA literature. Instead, our approach allows for a natural way of specifying the underlying distribution of the data from which symbolic variables are obtained. As a result, our approach enables statistical inference to be made at the underlying data level which may be more desirable, from the point of view of the statistical analyst. In addition, our approach offers an opportunity for statistical analysts to use higher-dimensional symbols to address complex real-world problems. The new models are demonstrated by simulated case studies. In addition, the proposed symbolic likelihood function for histograms has been applied to improve the analytical results of an existing model in measuring aerosol particle number concentration.
Persistent link to this record
Link to Publisher Version
Additional Link
Lin, Huan
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
Resource Type
Degree Type
PhD Doctorate
UNSW Faculty
download public version.pdf 3.72 MB Adobe Portable Document Format
Related dataset(s)