Knowledge discovery in text-rich information networks

Xu, Han

doi:10.26190/unsworks/19072

Knowledge discovery in text-rich information networks

Download files

Access & Terms of Use

open access
Copyright: Xu, Han

CC BY-NC-ND 3.0

Abstract

The interconnected nature of knowledge together with complex interactions among information agents has produced a massive, complex networked information ontology in the form of inter-connected electronic texts [SH12], fueling the creation and development of text-rich information networks. Text-rich information networks are a special type of information networks with integrated rich text and unstructured data. The ubiquity of text-rich information networks has fundamentally changed the way people acquire knowledge. Online digital libraries, crowd-sourcing websites and professional forums are becoming common sources for information foraging. Text-rich information networks feature complex heterogeneous structures with rich network-based and textual data commingled. This unique characteristic of text-rich information networks brings together social and textual traits, making information seeking extremely challenging with traditional Network Analysis and Text Mining methods. Unlike former studies that process network-based and textual data separately with a clear distinction between them, this thesis presents a synergetic approach that treats both network-based and textual data, as well as insights obtained as information structures. We chose a scientific citation network|an especially complex type of text-rich information networks|, as our object of study. Our experimental results confirmed that our methodology facilitates the fruitful exploitation of the idiosyncratic structure of text-rich information networks that leads to more effective foraging of insights on various cognitive complexity levels. This thesis advance the state-of-the-art in information seeking and knowledge discovery from scientific citation networks in multiple fronts. Our practical contributions include (1) a citation classifier that categorises citations into either functional or perfunctory as they occur in publications, (2) a scientific document ranker that ranks papers according to their potentials in facilitating later research, (3) a framework that provides a literature surveyor with a fine lens thanks to which they can identify and characterise the latent knowledge structure of their domain of interest, (4) a utility that reveals where subfields in a scientific domain are heading by categorising their evolutionary momentum as persistent, booming or withering and (5) a framework that generates contribution-based summaries of scientific papers and research areas of their most fruitful parts to effectively reduce the reading efforts required in understanding scientific documents.

Persistent link to this record

http://hdl.handle.net/1959.4/56399

DOI

https://doi.org/10.26190/unsworks/19072

Author(s)

Xu, Han

Supervisor(s)

Martin, Eric

Mahidadia, Ashesh

Publication Year

2016

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

2.57 MB

Adobe Portable Document Format

View full record Show statistics

Library

Knowledge discovery in text-rich information networks

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)