Microkernel mechanisms for improving the trustworthiness of commodity hardware

dc.contributor.advisor Elphinstone, Kevin en_US
dc.contributor.advisor Heiser, Gernot en_US Shen, Yanyan en_US 2022-03-15T12:17:59Z 2022-03-15T12:17:59Z 2019 en_US
dc.description.abstract The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthiness of computer systems based on commercial off-the-shelf (COTS) hardware that can malfunction when the hardware is impacted by transient hardware faults. The hardware anomalies, if undetected, can cause data corruptions, system crashes, and security vulnerabilities, significantly undermining system dependability. Specifically, we adopt the single event upset (SEU) fault model and address transient CPU or memory faults. We take advantage of the functional correctness and isolation guarantee provided by the formally verified seL4 microkernel and hardware redundancy provided by multicore processors, design the redundant co-execution (RCoE) architecture that replicates a whole software system (including the microkernel) onto different CPU cores, and implement two variants, loosely-coupled redundant co-execution (LC-RCoE) and closely-coupled redundant co-execution (CC-RCoE), for the ARM and x86 architectures. RCoE treats each replica of the software system as a state machine and ensures that the replicas start from the same initial state, observe consistent inputs, perform equivalent state transitions, and thus produce consistent outputs during error-free executions. Compared with other software-based error detection approaches, the distinguishing feature of RCoE is that the microkernel and device drivers are also included in redundant co-execution, significantly extending the sphere of replication (SoR). Based on RCoE, we introduce two kernel mechanisms, fingerprint validation and kernel barrier timeout, detecting fault-induced execution divergences between the replicated systems, with the flexibility of tuning the error detection latency and coverage. The kernel error-masking mechanisms built on RCoE enable downgrading from triple modular redundancy (TMR) to dual modular redundancy (DMR) without service interruption. We run synthetic benchmarks and system benchmarks to evaluate the performance overhead of the approach, observe that the overhead varies based on the characteristics of workloads and the variants (LC-RCoE or CC-RCoE), and conclude that the approach is applicable for real-world applications. The effectiveness of the error detection mechanisms is assessed by conducting fault injection campaigns on real hardware, and the results demonstrate compelling improvement. en_US
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri en_US
dc.subject.other Fault tolerance en_US
dc.subject.other Microkernel en_US
dc.subject.other Dependability en_US
dc.title Microkernel mechanisms for improving the trustworthiness of commodity hardware en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Shen, Yanyan
dspace.entity.type Publication en_US
unsw.accessRights.uri 2019-08-01 en_US
unsw.description.embargoNote Embargoed until 2019-08-01
unsw.relation.faculty Engineering
unsw.relation.originalPublicationAffiliation Shen, Yanyan, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.originalPublicationAffiliation Elphinstone, Kevin, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.originalPublicationAffiliation Heiser, Gernot, Computer Science & Engineering, Faculty of Engineering, UNSW en_US School of Computer Science and Engineering *
unsw.thesis.degreetype PhD Doctorate en_US
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
public version.pdf
1.83 MB
Resource type