Publication Search Results

Now showing 1 - 10 of 218
  • (2006) Koh, Shannon; Diessel, Oliver
    Conference Paper
    On-going improvements in the scaling of FPGA device sizes and time-to-market pressures encourage the use of module-oriented design flows [3], while economic factors favour the reuse of smaller devices for high performance computational tasks. One of the core problems in proposing dynamic modular reconfiguration approaches is supporting the differing communications needs of the sequence of modules configured over time [2]. Proposals to date have not focussed on communications issues. Moreover, they have advocated the use of specific protocols [4], or they cannot be readily implemented [1], or they suffer from high overheads [5], or rely upon deprecated features such as tri-state lines [7]. In contrast, we propose a methodology for the rapid deployment of a communications infrastructure that provides the wires required by dynamic modules and allows users to implement the protocols they want. Our aim is to support new tiled dynamically reconfigurable architectures such as Virtex-4, as well as mature device families.

  • (2006) Malik, Usama; Diessel, Oliver
    Conference Paper
    In line with Shannon's ideas, we define the entropy of FPGA reconfiguration to be the amount of information needed to configure a given circuit onto a given device. We propose using entropy as a gauge of the maximum configuration compression that can be achieved and determine the entropy of a set of 24 benchmark circuits for the Virtex device family. We demonstrate that simple off-the-shelf compression techniques such as Golomb encoding and hierarchical vector compression achieve compression results that are within 1-10% of the theoretical bound. We present an enhanced configuration memory system based on the hierarchical vector compression technique that accelerates reconfiguration in proportion to the amount of compression achieved. The proposed system demands little additional chip area and can be clocked at the same rate as the Virtex configuration clock.

  • (2006) Koh, Lih; Diessel, Oliver
    Conference Paper
    Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelining of bypass paths at processor frequencies above 1GHz and thus affect the performance of sequential code sequences. We propose dealing with these delays through a dynamic functional unit chaining approach. We study the performance benefits of a superscalar, out-of-order processor augmented with a two-by-two array of ALUs interconnected by a fast, partial bypass network. An online profiler guides the automatic configuration of the network to accelerate specific patterns of dependent instructions. A detailed study of benchmark simulations demonstrates these first steps towards mapping binaries to a small coarse-grained array at runtime can improve instruction throughput by over 18% and 25% when the microarchitecure includes bypass delays of one cycle and two cycles, respectively.

  • (2005) Malik, Usama; Diessel, Oliver
    Conference Paper
    This paper presents a configuration memory architecture that offers fast FPGA reconfiguration. The underlying principle behind the design is the use of fine-grained partial reconfiguration that allows significant configuration re-use while switching from one circuit to another. The proposed configuration memory works by reading on-chip configuration data into a buffer, modifying them based on the externally supplied data and writing them back to their original registers. A prototype implementation of the proposed design in a 90nm cell library indicates that the new memory adds less than 1% area to a commercially available FPGA implemented using the same library. The proposed design reduces the reconfiguration time for a wide set of benchmark circuits by 63%. However, power consumption during reconfiguration increases by a factor of 2.5 because the read-modify-write strategy results in more switching in the memory array.

  • (2005) Della Torre, Marco; Malik, Usama; Diessel, Oliver
    Conference Paper
    This paper presents an investigation and design of an enhanced on-chip configuration memory system that can reduce the time to (re)configure an FPGA. The proposed system accepts configuration data in a compressed form and performs decompression internally, The resulting FPCA can be (re)configured in time proportional to the size of the compressed bit-stream. The compression technique exploits the redundancy present in typical configuration data. An analysis of configurations corresponding to a set of benchmark circuits reveals that data that controls the same types of configurable elements have a common byte that occurs at a significantly higher frequency. This common byte is simply broadcast to all instances of that element. This step is followed by byte updates if required. The new configuration system has modest hardware requirements and was observed to reduce reconfiguration time for the benchmark set by two-thirds on average.

  • (2004) Malik, Usama; Diessel, Oliver
    Conference Paper
    Dynamic FPGA reconfiguration represents an overhead that can be critical to the performance of a realised circuit. To address this problem, This work presents a technique that is applicable at the times of loading the configuration data on the device. The technique involves reusing the on-chip configuration fragments to implement the next configuration thereby reducing the amount of data that must be externally transferred to the configuration memory. This work provides an analysis of the effect of circuit placement and configuration granularity on configuration reuse. The problem of finding placements of each circuit in a sequence of circuits so as to maximize configuration re-use is considered in detail. A greedy solution to this NP complete problem was found to reduce configuration overheads by less than 5% for a benchmark set. The effect of configuration granularity on configuration reuse was also considered and it was found that reducing the size of the unit of configuration allowed us to reduce the size of the benchmark configurations by 41%.

  • (2002) Diessel, Oliver; Malik, Usama; So, Keith
    Conference Paper
    Current FPGA design flows do not readily support high-level, behavioural design or the use of run-time reconfiguration. Designers are thus discouraged from taking a high-level view of their systems and cannot fully exploit the benefits of programmable hardware. This paper reports on our advances towards the development of design technology that supports behavioural specification and compilation of FPGA designs and automatically manages FPGA chip virtualization.

  • (2002) Guntsch, M; Middendorf, M; Scheuermann, B; Diessel, Oliver; ElGindy, Hossam; Schmeck, H; So, K
    Conference Paper
    We propose to modify a type of ant algorithm called Population based Ant Colony Optimization (P-ACO) to allow implementation on an FPGA architecture. Ant algorithms are adapted from the natural behavior of ants and used to find good solutions to combinatorial optimization problems. General layout on the FPGA and algorithmic description are covered. The most notable achievements featured in this paper are a runtime reduction and including the approximation of the heuristic function by a small set of favored decisions which changes over time.

  • (2002) Malik, Usama; So, Keith; Diessel, Oliver
    Conference Paper
    The Circal process algebra is being used to explore the behavioural specification of systems that are mapped to field programmable logic circuits. In this paper we report on the implementation and performance of an interpreter for system specifications given in the Circal language. In contrast to the typical design flow for field programmable technology in which designs are statically partitioned, synthesised, and mapped to pre-allocated resources, in this system the specified circuits are extracted from behavioural specifications that are partitioned, elaborated, mapped, and configured at run time as control passes through them. We report on the details of a design that targets the Celoxica RC1000 co-processor and assess preliminary performance results for this implementation. The results clearly demonstrate our method is a practical approach to overcome resource constraints, particularly in applications where these change at run time. The results also establish a benchmark against which to measure future improvements and alternative methods.