Knowledge discovery in text-rich information networks

Download files
Access & Terms of Use
open access
Copyright: Xu, Han
Altmetric
Abstract
The interconnected nature of knowledge together with complex interactions among information agents has produced a massive, complex networked information ontology in the form of inter-connected electronic texts [SH12], fueling the creation and development of text-rich information networks. Text-rich information networks are a special type of information networks with integrated rich text and unstructured data. The ubiquity of text-rich information networks has fundamentally changed the way people acquire knowledge. Online digital libraries, crowd-sourcing websites and professional forums are becoming common sources for information foraging. Text-rich information networks feature complex heterogeneous structures with rich network-based and textual data commingled. This unique characteristic of text-rich information networks brings together social and textual traits, making information seeking extremely challenging with traditional Network Analysis and Text Mining methods. Unlike former studies that process network-based and textual data separately with a clear distinction between them, this thesis presents a synergetic approach that treats both network-based and textual data, as well as insights obtained as information structures. We chose a scientific citation network|an especially complex type of text-rich information networks|, as our object of study. Our experimental results confirmed that our methodology facilitates the fruitful exploitation of the idiosyncratic structure of text-rich information networks that leads to more effective foraging of insights on various cognitive complexity levels. This thesis advance the state-of-the-art in information seeking and knowledge discovery from scientific citation networks in multiple fronts. Our practical contributions include (1) a citation classifier that categorises citations into either functional or perfunctory as they occur in publications, (2) a scientific document ranker that ranks papers according to their potentials in facilitating later research, (3) a framework that provides a literature surveyor with a fine lens thanks to which they can identify and characterise the latent knowledge structure of their domain of interest, (4) a utility that reveals where subfields in a scientific domain are heading by categorising their evolutionary momentum as persistent, booming or withering and (5) a framework that generates contribution-based summaries of scientific papers and research areas of their most fruitful parts to effectively reduce the reading efforts required in understanding scientific documents.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Xu, Han
Supervisor(s)
Martin, Eric
Mahidadia, Ashesh
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2016
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 2.57 MB Adobe Portable Document Format
Related dataset(s)