Detecting Rare and Collective Anomalies in Network Traffic Data using Summarization

Download files
Access & Terms of Use
open access
Embargoed until 2018-11-30
Copyright: Ahmed, Mohiuddin
Altmetric
Abstract
Network anomaly detection is becoming increasingly challenging due to the amount of global Internet traffic being produced at an unprecedented rate. This thesis identifies summarization as a key component for improving the scalability and accuracy of anomaly detection techniques. Instead of analysing a large amount of data to find anomalies, a summary of the data can be used for detection of anomalies. The goal of this thesis is to investigate three key research issues related to summarization based anomaly detection. The first research problem is identifying anomalies from large amount of data. When data size increases, the anomaly detection techniques perform poorly, due to increasing false alarms and computational cost. Detecting anomalies from a summary could address these issues, but existing summarization techniques cannot accurately represent the rare anomalies present in the dataset. This thesis proposes several summarization techniques based on sampling and partitional clustering that achieve significant improvement in anomaly detection accuracy and execution time over a wide range of benchmark datasets. In certain cyber-attack scenarios, such as flooding Denial of Service attacks, the data distribution changes significantly. This forms a collective anomaly, where some similar kinds of normal data instances appear in abnormally large numbers. Since they are not rare anomalies, existing anomaly detection techniques cannot properly identify them. The second research problem of the thesis investigates detecting this behaviour using a number of clustering and co-clustering based techniques. Experimental evaluation demonstrates that a Hurst parameter-based technique outperforms existing collective and rare anomaly detection techniques in terms of detection accuracy and false positive rate. Solutions of the two research problems are integrated into a general summarization based anomaly detection framework. Many online applications need to process and analyse continuously arriving streaming data, where data cannot be stored indefinitely or revisited. To address this problem, this thesis investigates new sampling based summarization techniques which demonstrate that the summaries produced from stream using pair-wise distance and template matching techniques can retain more anomalies than existing stream summarization techniques.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Ahmed, Mohiuddin
Supervisor(s)
Mahmood, Abdun
Maher, Michael
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2016
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 5.32 MB Adobe Portable Document Format
Related dataset(s)