A Probabilistic Graphical Model for Structured Prediction over Heterogeneous Data

Ye, Pengjie

doi:10.26190/unsworks/19920

Publication:

A Probabilistic Graphical Model for Structured Prediction over Heterogeneous Data

dc.contributor.advisor	Bain, Michael	en_US
dc.contributor.author	Ye, Pengjie	en_US
dc.date.accessioned	2022-03-22T15:48:13Z
dc.date.available	2022-03-22T15:48:13Z
dc.date.issued	2017	en_US
dc.description.abstract	Advances in sensor and instrumentation technology, together with cost reductions and capacity increases in computing and communication technologies, have led to the rapid accumulation of large amounts of data, additional to that collected by traditional methods. These sources form data called heterogeneous since it does not conform to a single type of data structure. A notable example is Electronic Health Record (EHR) data. Given the size and complexity of heterogeneous data there is a growing need to apply machine learning to predict, for example, patient outcomes from EHR data. Such data is inherently uncertain, so learning algorithms based on the framework of probabilistic graphical models for classification are appropriate. Despite the popularity of structured prediction, its capability in utilising domain knowledge and modelling on the source of structure is limited. This thesis identifies the connection between the mechanism of abstract domain knowledge and the structural setting of a graphical model. A clique-based mapping method is proposed to develop a structural-binding and knowledge embedding set of feature functions. A general discriminatively-trained probabilistic graphical model, the transitional random field (TRF), is proposed for modelling heterogeneous input data without the locality preserving property, which is widely seen in conditional random field(CRF) problem settings. We also introduce a novel ontology-based probabilistic similarity measurement for heterogeneous data which simplifies probabilistic computation in TRFs and enables efficient inference. The TRF framework identifies and maps information from the input structure to the non-isomorphic format determined by the output structure, while at the same time utilising structurally embedded existing knowledge implicit in the structure of the input and output. This ability to represent dependencies as features denoting transitional relations between input and output gives TRF the potential to learn models from a wide range of heterogeneous data and make predictions about structured domain knowledge. Our experiments on a large real-world data set demonstrate that TRF can be successfully applied to a demanding structured prediction problem over heterogeneous EHR data, with the proposed TRF training and inference algorithms obtaining good accuracy and efficiency.	en_US
dc.identifier.uri	http://hdl.handle.net/1959.4/58645
dc.language	English
dc.language.iso	EN	en_US
dc.publisher	UNSW, Sydney	en_US
dc.rights	CC BY-NC-ND 3.0	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/au/	en_US
dc.subject.other	heterogeneous data	en_US
dc.subject.other	structured prediction	en_US
dc.subject.other	TRF	en_US
dc.subject.other	ontology	en_US
dc.title	A Probabilistic Graphical Model for Structured Prediction over Heterogeneous Data	en_US
dc.type	Thesis	en_US
dcterms.accessRights	open access
dcterms.rightsHolder	Ye, Pengjie
dspace.entity.type	Publication	en_US
unsw.accessRights.uri	https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi	https://doi.org/10.26190/unsworks/19920
unsw.relation.faculty	Engineering
unsw.relation.originalPublicationAffiliation	Ye, Pengjie, Computer Science & Engineering, Faculty of Engineering, UNSW	en_US
unsw.relation.originalPublicationAffiliation	Bain, Michael, Computer Science & Engineering, Faculty of Engineering, UNSW	en_US
unsw.relation.school	School of Computer Science and Engineering	*
unsw.thesis.degreetype	PhD Doctorate	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: public version.pdf
Size:: 8.87 MB
Format:: application/pdf
Description:

Download

Resource type

Thesis

Publication: A Probabilistic Graphical Model for Structured Prediction over Heterogeneous Data

Files

Original bundle

Resource type

Publication:

A Probabilistic Graphical Model for Structured Prediction over Heterogeneous Data