Publication Search Results

Now showing 1 - 10 of 293
  • (2006) Koh, Shannon; Diessel, Oliver
    Conference Paper
    On-going improvements in the scaling of FPGA device sizes and time-to-market pressures encourage the use of module-oriented design flows [3], while economic factors favour the reuse of smaller devices for high performance computational tasks. One of the core problems in proposing dynamic modular reconfiguration approaches is supporting the differing communications needs of the sequence of modules configured over time [2]. Proposals to date have not focussed on communications issues. Moreover, they have advocated the use of specific protocols [4], or they cannot be readily implemented [1], or they suffer from high overheads [5], or rely upon deprecated features such as tri-state lines [7]. In contrast, we propose a methodology for the rapid deployment of a communications infrastructure that provides the wires required by dynamic modules and allows users to implement the protocols they want. Our aim is to support new tiled dynamically reconfigurable architectures such as Virtex-4, as well as mature device families.

  • (2006) Malik, Usama; Diessel, Oliver
    Conference Paper
    In line with Shannon's ideas, we define the entropy of FPGA reconfiguration to be the amount of information needed to configure a given circuit onto a given device. We propose using entropy as a gauge of the maximum configuration compression that can be achieved and determine the entropy of a set of 24 benchmark circuits for the Virtex device family. We demonstrate that simple off-the-shelf compression techniques such as Golomb encoding and hierarchical vector compression achieve compression results that are within 1-10% of the theoretical bound. We present an enhanced configuration memory system based on the hierarchical vector compression technique that accelerates reconfiguration in proportion to the amount of compression achieved. The proposed system demands little additional chip area and can be clocked at the same rate as the Virtex configuration clock.

  • (2006) Koh, Lih; Diessel, Oliver
    Conference Paper
    Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelining of bypass paths at processor frequencies above 1GHz and thus affect the performance of sequential code sequences. We propose dealing with these delays through a dynamic functional unit chaining approach. We study the performance benefits of a superscalar, out-of-order processor augmented with a two-by-two array of ALUs interconnected by a fast, partial bypass network. An online profiler guides the automatic configuration of the network to accelerate specific patterns of dependent instructions. A detailed study of benchmark simulations demonstrates these first steps towards mapping binaries to a small coarse-grained array at runtime can improve instruction throughput by over 18% and 25% when the microarchitecure includes bypass delays of one cycle and two cycles, respectively.

  • (2005) Malik, Usama; Diessel, Oliver
    Conference Paper
    This paper presents a configuration memory architecture that offers fast FPGA reconfiguration. The underlying principle behind the design is the use of fine-grained partial reconfiguration that allows significant configuration re-use while switching from one circuit to another. The proposed configuration memory works by reading on-chip configuration data into a buffer, modifying them based on the externally supplied data and writing them back to their original registers. A prototype implementation of the proposed design in a 90nm cell library indicates that the new memory adds less than 1% area to a commercially available FPGA implemented using the same library. The proposed design reduces the reconfiguration time for a wide set of benchmark circuits by 63%. However, power consumption during reconfiguration increases by a factor of 2.5 because the read-modify-write strategy results in more switching in the memory array.

  • (2005) Della Torre, Marco; Malik, Usama; Diessel, Oliver
    Conference Paper
    This paper presents an investigation and design of an enhanced on-chip configuration memory system that can reduce the time to (re)configure an FPGA. The proposed system accepts configuration data in a compressed form and performs decompression internally, The resulting FPCA can be (re)configured in time proportional to the size of the compressed bit-stream. The compression technique exploits the redundancy present in typical configuration data. An analysis of configurations corresponding to a set of benchmark circuits reveals that data that controls the same types of configurable elements have a common byte that occurs at a significantly higher frequency. This common byte is simply broadcast to all instances of that element. This step is followed by byte updates if required. The new configuration system has modest hardware requirements and was observed to reduce reconfiguration time for the benchmark set by two-thirds on average.

  • (2007) Snowdon, David; Petters, Stefan; Heiser, Gernot
    Conference Paper
    Minimising energy use is an important factor in the operation of many classes of embedded systems - in particular, battery-powered devices. Dynamic voltage and frequency scaling (DVFS) provides some control over a processor's performance and energy consumption. In order to employ DVFS for managing a system's energy use, it is necessary to predict the effect this scaling has on the system's total energy consumption. Simple (yet widely-used) energy models lead to dramatically incorrect results for important classes of application programs. Predicting the energy used under scaling requires (i) a prediction of the dependency of program performance (and hence duration of execution) on the frequencies and (ii) a prediction of the power drawn by the execution as a function of the frequencies and voltages. As both of these characteristics are workload-specific our approach builds a model that, given a workload execution at one frequency setpoint, will predict the run-time and power at any other frequency setpoint. We assume temporal locality (which is valid for the vast majority of applications) so predicting the characteristics of one time slice, frame, or other instance of a task, will imply the characteristics of subsequent time slices, frames or instances (e.g. MPEG video decoding). We present a systematic approach to building these models for a hardware platform, determining the best performance counters and weights. This characterisation, done once for a particular platform, produces platform-specific but workload-independent performance and power models. We implemented the model on a real system and evaluated it under a comprehensive benchmark suite against measurements of the actual energy consumption. The results show that the model can accurately predict the energy use of a wide class of applications and is highly responsive to changes in the application behaviour.

  • (2007) Ryzhyk, Leonid; Kuz, Ihor; Heiser, Gernot
    Conference Paper
    The lack of well-defined protocols for interaction with the operating system is a common source of defects in device drivers. In this paper we investigate the use of a formal language to define these protocols unambiguously. We present a language that allows us to convey all important requirements for driver behaviour in a compact specification and that can be readily understood by software engineers. It is intended to close the communication gap between OS and driver developers and enable more reliable device drivers.

  • (2007) Roscoe, Timothy; Elphinstone, Kevin; Heiser, Gernot
    Conference Paper
    In this paper, we question whether hypervisors are really acting as a disruptive force in OS research, instead arguing that they have so far changed very little at a technical level. Essentially, we have retained the conventional Unix-like OS interface and added a new ABI based on PC hardware which is highly unsuitable for most purposes. Despite commercial excitement, focus on hypervisor design may be leading OS research astray. However, adopting a different approach to virtualization and recognizing its value to academic research holds the prospect of opening up kernel research to new directions.

  • (2007) van Schaik, Carl; Heiser, Gernot
    Conference Paper
    This paper describes the techniques used to achieve high context-switching performance on ARM processors for the L4 microkernel and a para-virtualised Linux running on top. We examine how the previously-published techniques can be used in L4 with minimal changes to the kernel API. We also propose future API changes which make it easier to maximise memory-management performance, not only on ARM but also on architectures supporting a segmented memory model.