Effective information seeking from multi-source data

Download files
Access & Terms of Use
open access
Copyright: Ning, Xiaodong
Altmetric
Abstract
This thesis describes novel approaches to the problem of effective information seeking from multi-source data. Information seeking is one of the most important problems in the field of data mining and machine learning, which is widely used in various application scenarios. Information seeking is a task related problem, and its techniques are often designed dependent on the specific characteristic of task. Yet, information seeking faces many challenges due to information overload or limited available data. To overcome these two challenges, this work proposes several corresponding methodologies in three real cases. The first work in this thesis considers the challenge of overloaded data which consists of a lot of noises and designs a supervised content features based classifier to automatically detect informative tweets during crisis. To further enhance this model, a correlative deep learning based model is proposed to differentiate source types simultaneously and key information is summarized using two extraction rules. To efficiently find the positive information, the second work proposes a reinforcement learning based methodology to discover the multi-contextual information cue that is unrecognizable manually. Finally, targeting on the insufficient data problem in information seeking, a data augmented framework is also designed to artificially increase the sample space. The developed methods have been evaluated on benchmarked datasets and show better performance when compared to other state-of-art methods. For example, the proposed model in informative tweets classification task can improve more than 9\% in four evaluation metrics and information extraction rules can also provide superior performance than traditional methods(can extract extremely low frequent but highly important information). The proposed methodology targeting on another passenger demand prediction with information-overload problem also reduces RMSE and MAE by 9.2% and 10.8% respectively. As for the movie rating task with information-poor problem, the proposed model outperforms state-of-the-art methods with an improvement of 9.3% in MSE and 7.6% in hit ratio. Overall, the experimental analysis presented in this thesis shows that the developed methods significantly improve the performance during information seeking process with either information-overload problem or insufficient data problem.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Ning, Xiaodong
Supervisor(s)
Yao, Lina
Benatallah, Boualem
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2019
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 3.19 MB Adobe Portable Document Format
Related dataset(s)