Integrating community knowledge acquisition and data features analysis for recommending entity similarity functions

Download files
Access & Terms of Use
open access
Copyright: Ryu, Seung Hwan
Altmetric
Abstract
Similar entity search is the task of identifying entities that most closely resemble a given entity (e.g., a person, a document, or an image). The similar entity search plays a vital role in many application domains such as product search, people search, document search, data integration in business intelligence, and medicine and biological research. Although many techniques for similarity analysis have been proposed in the past, little work has been done on the question of which of the presented techniques are most suitable for a given similarity search task. Knowing the right similarity function is important as the task is highly domain- and data-dependent. In this thesis, we provide an approach for recommending which similarity functions (e.g., edit distance or jaccard similarity) should be used for measuring the similarity between two entities. The approach employs an incremental knowledge acquisition technique for capturing domain experts' knowledge about similarity functions and their usage contexts (e.g., entity class, attribute name and some keywords). In addition, for situations where domain experts have little or no knowledge about datasets (for example, when they face \textit{new} or \textit{different} ones), we analyze data features (e.g., misspellings or word orders) from entity attribute values, which are considerable when selecting similarity functions. Then, we recommend similarity functions according to the identified features. We provide tools for capturing domain experts' knowledge, for analyzing features from attribute data, and for assisting domain experts in finding entities similar to a given query entity using recommended similarity functions. We also demonstrate the feasibility and effectiveness of our proposed approach on several real-world datasets from different domains.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Ryu, Seung Hwan
Supervisor(s)
Benatallah, Boualem
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2012
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download whole.pdf 1.98 MB Adobe Portable Document Format
Related dataset(s)