Engineering

Publication Search Results

Now showing 1 - 10 of 276
  • (2006) Koh, Shannon; Diessel, Oliver
    Conference Paper
    On-going improvements in the scaling of FPGA device sizes and time-to-market pressures encourage the use of module-oriented design flows [3], while economic factors favour the reuse of smaller devices for high performance computational tasks. One of the core problems in proposing dynamic modular reconfiguration approaches is supporting the differing communications needs of the sequence of modules configured over time [2]. Proposals to date have not focussed on communications issues. Moreover, they have advocated the use of specific protocols [4], or they cannot be readily implemented [1], or they suffer from high overheads [5], or rely upon deprecated features such as tri-state lines [7]. In contrast, we propose a methodology for the rapid deployment of a communications infrastructure that provides the wires required by dynamic modules and allows users to implement the protocols they want. Our aim is to support new tiled dynamically reconfigurable architectures such as Virtex-4, as well as mature device families.

  • (2006) Malik, Usama; Diessel, Oliver
    Conference Paper
    In line with Shannon's ideas, we define the entropy of FPGA reconfiguration to be the amount of information needed to configure a given circuit onto a given device. We propose using entropy as a gauge of the maximum configuration compression that can be achieved and determine the entropy of a set of 24 benchmark circuits for the Virtex device family. We demonstrate that simple off-the-shelf compression techniques such as Golomb encoding and hierarchical vector compression achieve compression results that are within 1-10% of the theoretical bound. We present an enhanced configuration memory system based on the hierarchical vector compression technique that accelerates reconfiguration in proportion to the amount of compression achieved. The proposed system demands little additional chip area and can be clocked at the same rate as the Virtex configuration clock.

  • (2006) Koh, Lih; Diessel, Oliver
    Conference Paper
    Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelining of bypass paths at processor frequencies above 1GHz and thus affect the performance of sequential code sequences. We propose dealing with these delays through a dynamic functional unit chaining approach. We study the performance benefits of a superscalar, out-of-order processor augmented with a two-by-two array of ALUs interconnected by a fast, partial bypass network. An online profiler guides the automatic configuration of the network to accelerate specific patterns of dependent instructions. A detailed study of benchmark simulations demonstrates these first steps towards mapping binaries to a small coarse-grained array at runtime can improve instruction throughput by over 18% and 25% when the microarchitecure includes bypass delays of one cycle and two cycles, respectively.


  • (2005) Malik, Usama; Diessel, Oliver
    Conference Paper
    This paper presents a configuration memory architecture that offers fast FPGA reconfiguration. The underlying principle behind the design is the use of fine-grained partial reconfiguration that allows significant configuration re-use while switching from one circuit to another. The proposed configuration memory works by reading on-chip configuration data into a buffer, modifying them based on the externally supplied data and writing them back to their original registers. A prototype implementation of the proposed design in a 90nm cell library indicates that the new memory adds less than 1% area to a commercially available FPGA implemented using the same library. The proposed design reduces the reconfiguration time for a wide set of benchmark circuits by 63%. However, power consumption during reconfiguration increases by a factor of 2.5 because the read-modify-write strategy results in more switching in the memory array.

  • (2005) Della Torre, Marco; Malik, Usama; Diessel, Oliver
    Conference Paper
    This paper presents an investigation and design of an enhanced on-chip configuration memory system that can reduce the time to (re)configure an FPGA. The proposed system accepts configuration data in a compressed form and performs decompression internally, The resulting FPCA can be (re)configured in time proportional to the size of the compressed bit-stream. The compression technique exploits the redundancy present in typical configuration data. An analysis of configurations corresponding to a set of benchmark circuits reveals that data that controls the same types of configurable elements have a common byte that occurs at a significantly higher frequency. This common byte is simply broadcast to all instances of that element. This step is followed by byte updates if required. The new configuration system has modest hardware requirements and was observed to reduce reconfiguration time for the benchmark set by two-thirds on average.

  • (2007) Snowdon, David; Petters, Stefan; Heiser, Gernot
    Conference Paper
    Minimising energy use is an important factor in the operation of many classes of embedded systems - in particular, battery-powered devices. Dynamic voltage and frequency scaling (DVFS) provides some control over a processor's performance and energy consumption. In order to employ DVFS for managing a system's energy use, it is necessary to predict the effect this scaling has on the system's total energy consumption. Simple (yet widely-used) energy models lead to dramatically incorrect results for important classes of application programs. Predicting the energy used under scaling requires (i) a prediction of the dependency of program performance (and hence duration of execution) on the frequencies and (ii) a prediction of the power drawn by the execution as a function of the frequencies and voltages. As both of these characteristics are workload-specific our approach builds a model that, given a workload execution at one frequency setpoint, will predict the run-time and power at any other frequency setpoint. We assume temporal locality (which is valid for the vast majority of applications) so predicting the characteristics of one time slice, frame, or other instance of a task, will imply the characteristics of subsequent time slices, frames or instances (e.g. MPEG video decoding). We present a systematic approach to building these models for a hardware platform, determining the best performance counters and weights. This characterisation, done once for a particular platform, produces platform-specific but workload-independent performance and power models. We implemented the model on a real system and evaluated it under a comprehensive benchmark suite against measurements of the actual energy consumption. The results show that the model can accurately predict the energy use of a wide class of applications and is highly responsive to changes in the application behaviour.

  • (2006) Altermatt, Pietro; Schenk, Andreas; Heiser, Gernot
    Journal Article
    A parametrization of the density of states (DOS) near the band edge of phosphorus-doped crystalline silicon is derived from photoluminescence and conductance measurements, using a recently developed theory of band gap narrowing. It is shown that the dopant band only `touches` the conduction band at the Mott (metal-insulator) transition and that it merges with the conduction band at considerably higher dopant densities. This resolves well-known contradictions between conclusions drawn from various measurement techniques. With the proposed DOS, incomplete ionization of phosphorus dopants is calculated and compared with measurements in the temperature range from 300 to 30 K. We conclude that (a) up to 25% of dopants are nonionized at room temperature near the Mott transition and (b) there exists no significant amount of incomplete ionization at dopant densities far above the Mott transition. In a forthcoming part II of this paper, equations of incomplete ionization will be derived that are suitable for implementation in device simulators. (c) 2006 American Institute of Physics.

  • (2006) Altermatt, Pietro; Schenk, Andreas; Schmithuesen, B; Heiser, Gernot
    Journal Article
    Building on Part I of this paper [Altermatt , J. Appl. Phys. 100, 113714 (2006)], the parametrization of the density of states and of incomplete ionization (ii) is extended to arsenic- and boron-doped crystalline silicon. The amount of ii is significantly larger in Si:As than in Si:P. Boron and phosphorus cause a similar amount of ii although the boron energy level has a distinctly different behavior as a function of dopant density than the phosphorus level. This is so because the boron ground state is fourfold degenerate, while the phosphorus ground state is twofold degenerate. Finally, equations of ii are derived that are suitable for implementation in device simulators. Simulations demonstrate that ii increases the current gain of bipolar transistors by up to 25% and that it decreases the open-circuit voltage of thin-film solar cells by up to 10 mV. The simulation model therefore improves the predictive capabilities of device modeling of p-n-junction devices.

  • (2007) Ryzhyk, Leonid; Kuz, Ihor; Heiser, Gernot
    Conference Paper
    The lack of well-defined protocols for interaction with the operating system is a common source of defects in device drivers. In this paper we investigate the use of a formal language to define these protocols unambiguously. We present a language that allows us to convey all important requirements for driver behaviour in a compact specification and that can be readily understood by software engineers. It is intended to close the communication gap between OS and driver developers and enable more reliable device drivers.