Download files
Access & Terms of Use
open access
Embargoed until 2018-11-30
Copyright: Ahmed, Mohiuddin
Embargoed until 2018-11-30
Copyright: Ahmed, Mohiuddin
Altmetric
Abstract
Network anomaly detection is becoming increasingly challenging due to the amount
of global Internet traffic being produced at an unprecedented rate. This thesis identifies
summarization as a key component for improving the scalability and accuracy
of anomaly detection techniques. Instead of analysing a large amount of data to find
anomalies, a summary of the data can be used for detection of anomalies. The goal
of this thesis is to investigate three key research issues related to summarization based anomaly detection.
The first research problem is identifying anomalies from large amount of data.
When data size increases, the anomaly detection techniques perform poorly, due to
increasing false alarms and computational cost. Detecting anomalies from a summary
could address these issues, but existing summarization techniques cannot accurately
represent the rare anomalies present in the dataset. This thesis proposes several summarization techniques based on sampling and partitional clustering that achieve significant improvement in anomaly detection accuracy and execution time over a wide range of benchmark datasets.
In certain cyber-attack scenarios, such as flooding Denial of Service attacks, the
data distribution changes significantly. This forms a collective anomaly, where some
similar kinds of normal data instances appear in abnormally large numbers. Since
they are not rare anomalies, existing anomaly detection techniques cannot properly
identify them. The second research problem of the thesis investigates detecting this behaviour using a number of clustering and co-clustering based techniques. Experimental evaluation demonstrates that a Hurst parameter-based technique outperforms existing collective and rare anomaly detection techniques in terms of detection accuracy and false positive rate. Solutions of the two research problems are integrated into a general summarization based anomaly detection framework.
Many online applications need to process and analyse continuously arriving streaming
data, where data cannot be stored indefinitely or revisited. To address this problem,
this thesis investigates new sampling based summarization techniques which demonstrate that the summaries produced from stream using pair-wise distance and template matching techniques can retain more anomalies than existing stream summarization techniques.