Engineering

Publication Search Results

Now showing 1 - 10 of 13
  • (2006) Koh, Shannon; Diessel, Oliver
    Conference Paper
    On-going improvements in the scaling of FPGA device sizes and time-to-market pressures encourage the use of module-oriented design flows [3], while economic factors favour the reuse of smaller devices for high performance computational tasks. One of the core problems in proposing dynamic modular reconfiguration approaches is supporting the differing communications needs of the sequence of modules configured over time [2]. Proposals to date have not focussed on communications issues. Moreover, they have advocated the use of specific protocols [4], or they cannot be readily implemented [1], or they suffer from high overheads [5], or rely upon deprecated features such as tri-state lines [7]. In contrast, we propose a methodology for the rapid deployment of a communications infrastructure that provides the wires required by dynamic modules and allows users to implement the protocols they want. Our aim is to support new tiled dynamically reconfigurable architectures such as Virtex-4, as well as mature device families.

  • (2006) Malik, Usama; Diessel, Oliver
    Conference Paper
    In line with Shannon's ideas, we define the entropy of FPGA reconfiguration to be the amount of information needed to configure a given circuit onto a given device. We propose using entropy as a gauge of the maximum configuration compression that can be achieved and determine the entropy of a set of 24 benchmark circuits for the Virtex device family. We demonstrate that simple off-the-shelf compression techniques such as Golomb encoding and hierarchical vector compression achieve compression results that are within 1-10% of the theoretical bound. We present an enhanced configuration memory system based on the hierarchical vector compression technique that accelerates reconfiguration in proportion to the amount of compression achieved. The proposed system demands little additional chip area and can be clocked at the same rate as the Virtex configuration clock.

  • (2006) Koh, Lih; Diessel, Oliver
    Conference Paper
    Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelining of bypass paths at processor frequencies above 1GHz and thus affect the performance of sequential code sequences. We propose dealing with these delays through a dynamic functional unit chaining approach. We study the performance benefits of a superscalar, out-of-order processor augmented with a two-by-two array of ALUs interconnected by a fast, partial bypass network. An online profiler guides the automatic configuration of the network to accelerate specific patterns of dependent instructions. A detailed study of benchmark simulations demonstrates these first steps towards mapping binaries to a small coarse-grained array at runtime can improve instruction throughput by over 18% and 25% when the microarchitecure includes bypass delays of one cycle and two cycles, respectively.


  • (2006) Zhu, Liming; Gorton, Ian; Liu, Yan; Bui, Bao
    Conference Paper
    Web services solutions are being increasingly adopted in enterprise systems. However, ensuring the quality of service of Web services applications remains a costly and complicated performance engineering task. Some of the new challenges include limited controls over consumers of a service, unforeseeable operational scenarios and vastly different XML payloads. These challenges make existing manual performance analysis and benchmarking methods difficult to use effectively. This paper describes an approach for generating customized benchmark suites for Web services applications from a software architecture description following a Model Driven Architecture (MDA) approach. We have provided a performance-tailored version of the UML 2.0 Testing Profile so architects can model a flexible and reusable load testing architecture, including test data, in a standards compatible way. We extended our MDABench [27] tool to provide a Web service performance testing “cartridge” associated with the tailored testing profile. A load testing suite and automatic performance measurement infrastructure are generated using the new cartridge. Best practices in Web service testing are embodied in the cartridge and inherited by the generated code. This greatly reduces the effort needed for Web service performance benchmarking while being fully MDA compatible. We illustrate the approach using a case study on the Apache Axis platform.

  • (2006) Bain, Michael; Ahsan, Nasir; Potter, John; Gaeta, Bruno; Temple, Mark; Dawes, Ian
    Conference Paper

  • (2006) Zhao, Xin; Chou, Chun; Guo, Jun; Jha, Sanjay
    Conference Paper
    To support reliable multicast routing in wireless mesh networks, it is important to protect multicast sessions against link or node failures. The issue of protecting multicast sessions in wireless mesh networks is a new problem to the best of our knowledge. In this paper, we propose a resilient forwarding mesh approach for protecting a multicast session in wireless mesh networks. Utilizing the wireless broadcast advantage, a resilient forwarding mesh effectively establishes two node disjoint paths for each source destination pair. This allows a multicast session to be immune from any single link or intermediate node failure. We introduce four heuristic algorithms to obtain approximate solutions that seek to minimize the number of required broadcast transmissions. We evaluate the performance of these heuristic algorithms against the optimal resilient forwarding mesh (ORFM) obtained by solving an integer linear programming (ILP) formulation of the problem. Experimental results demonstrate that one of these heuristic algorithms, which we call the minimal disjoint mesh algorithm (MDM), performs sufficiently close to ORFM. Besides, we find that the resilient forwarding mesh approach provides efficient 1+1 protection [8] to the multicast session without incurring much additional overhead on a single minimal cost multicast tree.

  • (2006) Zhao, Xin; Guo, Jun; Chou, Chun; Jha, Sanjay
    Conference Paper
    To support reliable multicast routing in wireless mesh networks, it is important to protect multicast sessions against link or node failures. In this paper, we propose a resilient forwarding mesh approach for protecting a multicast session. Utilizing the wireless broadcast advantage, a resilient forwarding mesh effectively establishes two node disjoint paths for each source-destination pair. This allows a multicast session to be immune from any single link or intermediate node failure. An integer linear programming (ILP) formulation is presented to find the optimal resilient forwarding mesh (ORFM) that minimizes the number of broadcast transmissions. In comparison with the existing optimal path-pair (OPP) approach proposed in [1] for wired mesh networks, our experimental results demonstrate that ORFM outperforms OPP in wireless scenarios.

  • (2006) Janapsatya, Andhi; Ignjatovic, Aleksandar; Parameswaran, Sri
    Conference Paper
    Modern embedded system execute a single application or a class of applications repeatedly. A new emerging methodology of designing embedded system utilizes configurable processors where the cache size, associativity, and line size can be chosen by the designer. In this paper, a method is given to rapidly find the L1 cache miss rate of an application. An energy model and an execution time model are developed to find the best cache configuration for the given embedded application. Using benchmarks from Mediabench, we find that our method is on average 45 times faster to explore the design space, compared to Dinero IV while still having 100% accuracy.

  • (2006) Janapsatya, Andhi; Ignjatovic, Aleksandar; Parameswaran, Sri
    Conference Paper
    Scratchpad memory has been introduced as a replacement for cache memory as it improves the performance of certain embedded systems. Additionally, it has also been demonstrated that scratchpad memory can significantly reduce the energy consumption of the memory hierarchy of embedded systems. This is significant, as the memory hierarchy consumes a substantial proportion of the total energy of an embedded system. This paper deals with optimization of the instruction memory scratchpad based on a novel methodology that uses a metric which we call the concomitance. This metric is used to find basic blocks which are executed frequently and in close proximity in time. Once such blocks are found, they are copied into the scratchpad memory at appropriate times; this is achieved using a special instruction inserted into the code at appropriate places. For a set of benchmarks taken from Mediabench, our scratchpad system consumed just 59% (avg) of the energy of the cache system, and 73% (avg) of the energy of the state of the art scratchpad system, while improving the overall performance. Compared to the state of the art method, the number of instructions copied into the scratchpad memory from the main memory is reduced by 88%.