Abstract
Malware is software code that has malicious intent. In recent years, there have been
huge changes in the threat landscape. As our dependency on the Internet for social related information sharing and work increases, the number of the possible threats is
huge and we are indeed susceptible to them. Attacks may from the individual or
organisational level, to nation-states resorting to cyber warfare to infiltrate and sabotage
enemies operation. Hence, the need for a secure and dependable cyber defence is
relevant at all levels.
Malware can only do harm if it is allowed to propagate and execute without being
detected. Detection based on signature alone is not the answer, because new malware
with new signatures cannot be detected. Thus, behaviour-based detection is needed to
detect novel malware attacks. Moreover, malware detection is a challenging task when
most of the latest malware employs some protection and evasion techniques. In this
study, we present a malware detection system that addresses both propagation and
execution. Detection is based on monitoring session traffic for propagation, and API call
sequences for execution. Our approach is inspired by the human immune system
theories known as the Self/Non-self Theory and the Danger Theory.
For malware detection during propagation, we investigate the effectiveness of
signature-based detection, anomaly-based detection and the combination of both. The
decision-making relies upon a collection of recent signatures of session-based traffic
data collected at the endpoint (single computer) level. Patterns in terms of port
distributions and frequency or session rates of the signatures are observed. If an
abnormality is found, it often signifies worm behaviour. A knowledge base consisting of
recent traffic data, which is used to predict future traffic patterns, helps to reverse the
incorrect flagging of suspected worms. The knowledge base is made of recent traffic,
used to predict future patterns of traffic data. It maintains only recent data as the usage
pattern of a computer changes over time.
Our proposed system includes several detectors, the operations of which are
governed by several parameters. We study both how these parameters affect the results
and performances when different detectors are or are not included. We find that the
detectors produce inconsistent results when used independently but when used together achieve promising detection rates. In addition, we identify which worms are
consistently detected by the system, and the characteristics of those the system cannot
detect well.
For detection based on execution, we analyse sequences of API calls grouped into ngrams which are compared with benign and malware profiles. A decision is made based on a statistical measure, which indicates how close the behaviour represented in the ngrams is to each of the profiles. Experiments show that the system is capable of
correctly detecting malware early in its execution.
The main contributions of this thesis are: the proposal and evaluation of a
framework for detecting malware, that considers both propagation and execution in a
systematic way; the detection methods are based on information that is simpler to
process than other proposals in the literature, yet still achieve very high detection
accuracy; and malware can be correctly recognised early in its execution.
The experimental results show that our framework is promising in terms of effective
behaviour-based detection that can detect malware and protect our computer networks
from future zero-day attacks.