Abstract
The interconnected nature of knowledge together with complex interactions among information
agents has produced a massive, complex networked information ontology in the form of inter-connected electronic texts [SH12], fueling the creation and development of text-rich information networks. Text-rich information networks are a special type of information networks with integrated rich text and unstructured data. The ubiquity of text-rich information networks has fundamentally changed the way people acquire knowledge. Online digital libraries, crowd-sourcing websites and professional forums are becoming common sources for information foraging.
Text-rich information networks feature complex heterogeneous structures with rich network-based and textual data commingled. This unique characteristic of text-rich information networks brings together social and textual traits, making information seeking extremely challenging with traditional Network Analysis and Text Mining methods. Unlike former studies that process network-based and textual data separately with a clear distinction between them, this thesis presents a synergetic approach that treats both network-based and textual data, as well as insights obtained as information structures. We chose a scientific citation network|an especially complex type of text-rich information networks|, as
our object of study. Our experimental results confirmed that our methodology facilitates the fruitful exploitation of the idiosyncratic structure of text-rich information networks that leads to more effective foraging of insights on various cognitive complexity levels.
This thesis advance the state-of-the-art in information seeking and knowledge discovery from scientific citation networks in multiple fronts. Our practical contributions include (1) a citation classifier that categorises citations into either functional or perfunctory as they occur in publications, (2) a scientific document ranker that ranks papers according to their potentials in facilitating later research, (3) a framework that provides a literature surveyor with a fine lens thanks to which they can identify and characterise the latent knowledge structure of their domain of interest, (4) a utility that reveals where subfields in a scientific domain are heading by categorising their evolutionary momentum as persistent, booming or withering and (5) a framework that generates contribution-based summaries of scientific papers and research areas of their most fruitful parts to effectively reduce the reading efforts required in understanding scientific documents.