Sentence level relation extraction via relation embedding

Huang, Haojie

doi:10.26190/unsworks/22705

Sentence level relation extraction via relation embedding

Download files

Access & Terms of Use

open access
Copyright: Huang, Haojie

CC BY-NC-ND 3.0

Abstract

Relation extraction is a task of information extraction that extracts semantic relations from text, which usually occur between two named entities. It is a crucial step for converting unstructured text into structured data that forms a knowledge base, so that it may be used to build systems with special purposes such as business decision making and legal case-based reasoning. Relation extraction in sentence-level is the most common type, because relationships can be usually discovered within single sentences. One obvious example is the relationship between the subject and the object. As it has been studied for years, there are various methods for relation extraction such as feature based methods, distant supervision and recurrent neural networks. However, the following problems have been found in these approaches. (i) These methods require large amounts of human labelled data to train the model in order to get high accuracy. (ii) These methods are hard to be applied in real applications, especially in specialised domains where experts are required for both labelling and validating the data. In this thesis, we address these problems in two aspects: academic research and application development. In terms of academic research, we propose models that can be trained with less amount of labelled training data. The first approach trains the relation feature embedding, then it uses the feature embeddings for obtaining relation embeddings. To minimise the effect of designing handcraft features, the second approach adopts RNNs to automatically learn features from the text. In these methods, relation embeddings are reduced to a smaller vector space, and the relations with similar meanings form clusters. Therefore, the model can be trained with a smaller number of labelled data. The last approach adopts seq2seq regularisation, which can improve the accuracy of the relation extraction models. In terms of application development, we construct a prototype web service for searching semantic triples using relations extracted by third-party extraction tools. In the last chapter, we run all our proposed models on real-world legal documents. We also build a web application for extracting relations in legal text based on the trained models, which can help lawyers investigate the key information in legal cases more quickly. We believe that the idea of relation embeddings can be applied in domains that require relation extraction but with limited labelled data.

Persistent link to this record

http://hdl.handle.net/1959.4/71079

DOI

https://doi.org/10.26190/unsworks/22705

Author(s)

Huang, Haojie

Publication Year

2021

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

5.81 MB

Adobe Portable Document Format

View full record Show statistics

Library

Sentence level relation extraction via relation embedding

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)