Abstract
Heterogeneous MultiProcessor System on Chips (MPSoCs) are viable implementation
platforms for multimedia. However, optimisation of such platforms for performance,
area footprint and energy consumption is a challenge. This thesis explores
the paradigm of pipelined MPSoCs, and introduces design-time and run-time optimisations,
in the form of an optimisation framework. This is the first time a
framework has been proposed for optimisation of both the area footprint and energy
consumption of a pipelined MPSoC.
In a pipelined MPSoC, processors are organised into pipeline stages and are
connected through First In First Out (FIFO) buffers. Application Specific Instruction
set Processors (ASIPs) are used so that their customisation can be exploited
to optimise the area footprint of a pipelined MPSoC. Each processor has a number
of configurations, which are made up of differing custom instructions and cache
configurations, and thus enable performance-area trade-off.
This thesis proposes analytical models and estimation methods to aid quick
design space exploration of pipelined MPSoCs, when there are billions of design
points. Three analytical models are proposed to estimate the execution time, latency
and throughput of a pipelined MPSoC, and two estimation methods are proposed
to reduce the number of slow, full-system, cycle-accurate simulations. Researchers
have used absolute accuracy and graphical fidelity to evaluate estimation models.
Since there does not exist any metric to quantify fidelity, this thesis also proposes
fidelity metrics to enable evaluation of estimation models in terms of not only the
absolute accuracy, but also the fidelity.
For design space exploration, two algorithms are proposed to select one configuration
per processor so as to minimise the area footprint of a pipelined MPSoC
under a latency or a throughput constraint. Experiments with a number of pipelined
MPSoCs, executing JPEG encoder, JPEG decoder, MP3 encoder and H.264 encoder
applications, showed that the analytical models and the estimation methods had a
maximum absolute error of 18.67% and a minimum fidelity of 0.88. The proposed
analytical models and estimation methods resulted in simulation times of only several
hours for design spaces containing up to 10^18 design points. The proposed
exploration algorithms explored such large design spaces for Pareto fronts in less
than seven minutes.
Next, this thesis proposes a novel adaptive pipelined MPSoC architecture, where
idle processors are transitioned into low-power states at run-time to reduce energy
consumption. Two run-time managers are proposed for the adaptive pipelined MPSoC.
Firstly, a run-time processor manager is proposed to manage the idle processors
by either clock-gating or power-gating them. Secondly, a run-time power manager
is proposed to select the most beneficial low-power state for an idle processor. Experiments
with an H.264 video encoder, designed for HD720p at 30 fps, showed that
the processor manager provided an energy reduction of up to 34% and 39% when
clock-gating and power-gating was used respectively with a minimum throughput
of 28.75 fps (which is within the specifications), compared to a pipelined MPSoC
without run-time adaptability. Compared to the use of only the processor manager,
the power manager reduced up to a further 40% energy consumption with only an
additional 0.5% degradation of the throughput.
Lastly, this thesis proposes multi-mode pipelined MPSoCs, where multiple pipelined
MPSoCs designed separately are merged into a single pipelined MPSoC with modes.
A multi-mode pipelined MPSoC enables further reduction of the area footprint by
sharing the processors and FIFO buffers. Three merging heuristics are proposed to
find the maximal overlap between the individual pipelined MPSoCs, where the optimality
of the heuristics is traded-off with their running times. The results indicated
significant area footprint reduction – up to 62% processor area, 57% FIFO area and
59% processor/FIFO ports – when compared to individual pipelined MPSoCs.