Effective information seeking from multi-source data

Ning, Xiaodong

doi:10.26190/unsworks/21297

Effective information seeking from multi-source data

Download files

Access & Terms of Use

open access
Copyright: Ning, Xiaodong

CC BY-NC-ND 3.0

Abstract

This thesis describes novel approaches to the problem of effective information seeking from multi-source data. Information seeking is one of the most important problems in the field of data mining and machine learning, which is widely used in various application scenarios. Information seeking is a task related problem, and its techniques are often designed dependent on the specific characteristic of task. Yet, information seeking faces many challenges due to information overload or limited available data. To overcome these two challenges, this work proposes several corresponding methodologies in three real cases. The first work in this thesis considers the challenge of overloaded data which consists of a lot of noises and designs a supervised content features based classifier to automatically detect informative tweets during crisis. To further enhance this model, a correlative deep learning based model is proposed to differentiate source types simultaneously and key information is summarized using two extraction rules. To efficiently find the positive information, the second work proposes a reinforcement learning based methodology to discover the multi-contextual information cue that is unrecognizable manually. Finally, targeting on the insufficient data problem in information seeking, a data augmented framework is also designed to artificially increase the sample space. The developed methods have been evaluated on benchmarked datasets and show better performance when compared to other state-of-art methods. For example, the proposed model in informative tweets classification task can improve more than 9\% in four evaluation metrics and information extraction rules can also provide superior performance than traditional methods(can extract extremely low frequent but highly important information). The proposed methodology targeting on another passenger demand prediction with information-overload problem also reduces RMSE and MAE by 9.2% and 10.8% respectively. As for the movie rating task with information-poor problem, the proposed model outperforms state-of-the-art methods with an improvement of 9.3% in MSE and 7.6% in hit ratio. Overall, the experimental analysis presented in this thesis shows that the developed methods significantly improve the performance during information seeking process with either information-overload problem or insufficient data problem.

Persistent link to this record

http://hdl.handle.net/1959.4/62927

DOI

https://doi.org/10.26190/unsworks/21297

Author(s)

Ning, Xiaodong

Supervisor(s)

Yao, Lina

Benatallah, Boualem

Publication Year

2019

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

3.19 MB

Adobe Portable Document Format

View full record Show statistics

Library

Effective information seeking from multi-source data

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)