Repairing FPGA configuration memory errors using dynamic partial reconfiguration

Download files
Access & Terms of Use
open access
Copyright: Nguyen, Tran Huu Nguyen
Altmetric
Abstract
The configuration memory of SRAM-based Field-Programmable Gate Arrays (FPGAs) is susceptible to radiation-induced Single Event Upsets (SEUs). This has limited their adoption for space applications and led to intensive research to discover techniques for mitigating the radiation effects in such devices. The reliability of FPGA user circuits is commonly improved by applying Triple Modular Redundancy (TMR), whereas configuration memory errors are corrected by reloading a golden bitstream for the design. Two approaches have emerged for doing so. The first, known as scrubbing, periodically refreshes the configuration memory of the entire device. The second makes use of dynamic partial reconfiguration to reload the configuration of an individual circuit module that has been found to be in error. This latter approach, which we refer to as Module-based Error Recovery (MER) holds promise for being more responsive and needing less energy than scrubbing, at the cost of greater implementation complexity. The research work reported in this thesis aims to clarify the design, and improve the reliability of FPGA systems that employ TMR with MER. The research has involved studying and contributing to the development of several aspects of TMR-MER infrastructure, most notably, the design of reliable Reconfiguration Control Networks (RCNs) for conveying reconfiguration requests to a central reconfiguration controller, new reliability models for TMR-MER systems and improved scheduling techniques to check for faulty modules. This thesis evaluates the impact of RCNs on system reliability and performance. Results show that a "hard RCN" is the most reliable despite having the highest network latency. As the order in which voters are checked for errors over the RCN has an impact on overall system reliability, this thesis then proposes a Voter Scheduling Engine (VSE) for dynamically prioritizing the TMR component to be checked next. This thesis proposes reliability models for TMR-MER systems suffering multiple SEUs and employing round-robin or Variable-Rate Voter Checking (VRVC) and proposes the use of a genetic algorithm to determine a static schedule for maximizing the system reliability. Simulation results indicate that the mean time to failure of TMR-MER systems employing VRVC is up to 400% greater than when the usual round robin is used to check components for errors. The thesis concludes with directions for further study.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Nguyen, Tran Huu Nguyen
Supervisor(s)
Diessel, Oliver
Cetin, Ediz
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2017
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 2.47 MB Adobe Portable Document Format
Related dataset(s)