Detection of Translator Stylometry using Pair-wise Comparative Classification and Network Motif Mining

Download files
Access & Terms of Use
open access
Copyright: El-Fiqi, Heba
Altmetric
Abstract
Stylometry is the study of the unique linguistic styles and writing behaviours of individuals. The identification of translator stylometry has many contributions in fields such as intellectual-property, education, and forensic linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient machine learning literature on the topic. Some authors even claimed that detecting who translated a piece of text is a problem with no solution; a claim we will challenge in this thesis. In this thesis, we evaluated the use of existing lexical measures for the translator stylometry problem. It was found that vocabulary richness could not identify translator stylometry. This encouraged us to look for non-traditional representations to discover new features to unfold translator stylometry. Network motifs are small sub-graphs that aim at capturing the local structure of a real network. We designed an approach that transforms the text into a network then identifies the distinctive patterns of a translator by employing network motif mining. During our investigations, we redefined the problem of translator stylometry identification as a new type of classification problems that we call Comparative Classification Problem (CCP). In the pair-wise CCP (PWCCP), data are collected on two subjects. The classification problem is to decide given a piece of evidence, which of the two subjects is responsible for it. The key difference between PWCCP and traditional binary problems is that hidden patterns can only be unmasked by comparing the instances as pairs. A modified C4.5 decision tree classifier, we call PWC4.5, is then proposed for PWCCP. A comparison between the two cases of detecting the translator using traditional classification and PWCCP demonstrated a remarkable ability for PWCCP to discriminate between translators. The contributions of the thesis are: (1) providing an empirical study to evaluate the use of stylistic based features for the problem of translator stylometry identification; (2) introducing network motif mining as an effective approach to detect translator stylometry; (3) proposing a modified C4.5 methodology for pair-wise comparative classification.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
El-Fiqi, Heba
Supervisor(s)
Abbass, Hussein
Petraki, Eleni
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2013
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download whole.pdf 4.41 MB Adobe Portable Document Format
Related dataset(s)