Publication:
A Probabilistic Graphical Model for Structured Prediction over Heterogeneous Data

dc.contributor.advisor Bain, Michael en_US
dc.contributor.author Ye, Pengjie en_US
dc.date.accessioned 2022-03-22T15:48:13Z
dc.date.available 2022-03-22T15:48:13Z
dc.date.issued 2017 en_US
dc.description.abstract Advances in sensor and instrumentation technology, together with cost reductions and capacity increases in computing and communication technologies, have led to the rapid accumulation of large amounts of data, additional to that collected by traditional methods. These sources form data called heterogeneous since it does not conform to a single type of data structure. A notable example is Electronic Health Record (EHR) data. Given the size and complexity of heterogeneous data there is a growing need to apply machine learning to predict, for example, patient outcomes from EHR data. Such data is inherently uncertain, so learning algorithms based on the framework of probabilistic graphical models for classification are appropriate. Despite the popularity of structured prediction, its capability in utilising domain knowledge and modelling on the source of structure is limited. This thesis identifies the connection between the mechanism of abstract domain knowledge and the structural setting of a graphical model. A clique-based mapping method is proposed to develop a structural-binding and knowledge embedding set of feature functions. A general discriminatively-trained probabilistic graphical model, the transitional random field (TRF), is proposed for modelling heterogeneous input data without the locality preserving property, which is widely seen in conditional random field(CRF) problem settings. We also introduce a novel ontology-based probabilistic similarity measurement for heterogeneous data which simplifies probabilistic computation in TRFs and enables efficient inference. The TRF framework identifies and maps information from the input structure to the non-isomorphic format determined by the output structure, while at the same time utilising structurally embedded existing knowledge implicit in the structure of the input and output. This ability to represent dependencies as features denoting transitional relations between input and output gives TRF the potential to learn models from a wide range of heterogeneous data and make predictions about structured domain knowledge. Our experiments on a large real-world data set demonstrate that TRF can be successfully applied to a demanding structured prediction problem over heterogeneous EHR data, with the proposed TRF training and inference algorithms obtaining good accuracy and efficiency. en_US
dc.identifier.uri http://hdl.handle.net/1959.4/58645
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/3.0/au/ en_US
dc.subject.other heterogeneous data en_US
dc.subject.other structured prediction en_US
dc.subject.other TRF en_US
dc.subject.other ontology en_US
dc.title A Probabilistic Graphical Model for Structured Prediction over Heterogeneous Data en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Ye, Pengjie
dspace.entity.type Publication en_US
unsw.accessRights.uri https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi https://doi.org/10.26190/unsworks/19920
unsw.relation.faculty Engineering
unsw.relation.originalPublicationAffiliation Ye, Pengjie, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.originalPublicationAffiliation Bain, Michael, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.school School of Computer Science and Engineering *
unsw.thesis.degreetype PhD Doctorate en_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
public version.pdf
Size:
8.87 MB
Format:
application/pdf
Description:
Resource type