Enterprise Network Security via Data-driven Methods and Programmable Network Telemetry

Access & Terms of Use
embargoed access
Embargoed until 2023-01-01
Copyright: Lyu, Minzhao
Enterprise networks are both complex and dynamic, with various kinds of servers (web, email, VPN, storage), clients (fixed, wireless), and Internet-of-Things devices (cameras, printers, sensors) being deployed, moved, and removed continuously. Furthermore, these assets are spread across various network segments (e.g., VLANs), often managed by different departments, with complex interconnection rules between segments, to public/private cloud services, and to the general Internet. It is therefore not surprising that organizational IT departments struggle to track their connected assets, monitor their operational health, understand the attack surface they expose, and protect them from external as well as internal threats. Current enterprise security systems such as Next-Generation-Firewalls (NGFW) and intrusion detection systems (IDS) are unable to cope with the growing volumes and diversity of emerging cyber-threats. Hardware appliance-based solutions are not just expensive, but also inflexible as their high-speed performance is optimized for relatively static rulesets. Software solutions on the other hand have great flexibility, but struggle to cope with high data rates which limit the granularity at which they analyze traffic for embedded threats. To advance the state-of-the-art of enterprise asset monitoring and distributed network attack detection, my thesis proposes a new approach that combines hardware performance with software flexibility, by leveraging the concepts of Programmable Network (PN) and Machine Learning (ML). Telemetry from Terabit-speed Programmable Switches is used to extract key attributes of traffic streams, and this is combined with ML models of enterprise asset behavior to monitor their health and to detect attacks. I make four key contributions. My first contribution focuses on the Domain Name System (DNS). I analyze DNS traffic from two large organizations to identify the behavioral aspects of various DNS assets. Using the behavioral attributes, I develop a clustering method to classify assets (e.g., recursive resolvers and authoritative name servers) and track their health through a set of well-articulated monitoring metrics. I demonstrate that my method successfully identifies over 100 key DNS assets in the two organizations and is further able to make recommendations on how these assets can be better secured against misuse. The second contribution extends my enterprise asset classification beyond DNS to include other asset types such as web servers, VPN servers, and file storage servers. For this, I develop a system that uses Programmable Network techniques to extract telemetry efficiently, feeds the attributes to a multi-grained ML-based scheme that classifies the assets in real-time, and reactively collects packet-level telemetry of suspicious hosts for forensics analysis. My method identifies hundreds of typical servers and thousands of less common assets (e.g., LDAP server and Redis proxy) across the two organizations. It additionally highlights instances of atypical behavior that provide advance warnings to IT staff on potentially anomalous assets. The third contribution detects DNS-based network attacks on enterprise hosts. To this end, I analyze incoming DNS traffic to the two organizations, and develop a hierarchical anomaly detection method that profiles incoming DNS traffic at various levels of hierarchy (e.g., host, subnet, and AS) to isolate DNS attackers that could be stealthy and distributed. The models I train detect DNS attacks in lab data with over 99% accuracy at each level of the hierarchy, and in a 1-month trial in the wild reveal hundreds of attacks that were missed by the organizational firewalls. My fourth contribution expands the attack detection from DNS to the whole dimension of network traffic. To achieve both detection effectiveness and operational practicality, I develop a multi-stage progressive inference architecture to optimally detect network attacks through a series of stages (e.g., active enterprise hosts, victims, distributed attackers, and malicious flows) each with increasing telemetry cost but narrowing focus. Evaluations using real distributed denial-of-service (DDoS) attacks and large-scale enterprise traffic traces demonstrate the ability of my system in detecting distributed network attacks to the finest flow-level with practically low computational costs as around 30% CPU and 8% RAM usage on a typical blade server, which is not achievable by its counterpart solutions. Taken together, my contributions apply Programmable Network and Machine Learning to develop new practical and effective ways that give enterprise IT departments continuous visibility of their assets, advance warning of the threat surface they expose, and real-time alarms when network attacks unfold.
Persistent link to this record
Link to Publisher Version
Additional Link
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
Resource Type
Degree Type
PhD Doctorate
UNSW Faculty