ASIC Multimedia Chips and a Short Review of Section for Low

Download Report

Transcript ASIC Multimedia Chips and a Short Review of Section for Low

ASIC Multimedia Chips and a
Short Review of Section for Low
Power Multimedia in ISSCC
2006
Mentor: Dr. Fakhraii
By: Masoud Rostami,
Agenda
1. PRAMs
2. Multimedia ASIC Chips
3. Multimedia Processors in ISSCC06
4. Summery
5. References
Phase-Change Memory
PRAM (Phase-Change Random Access Memory) is attracting great interest
as the candidate for the next generation of non-volatile memory devices.
The cell material used in PRAM is a Chalcogenide alloy (Ge2Sb2Te5 or GST(
which takes either low resistivity polycrystalline phase (SET State for ‘0’) or
high resistivity amorphous phase ( RESET state for ‘1’)
Conversion between two phases is realized by resistive heating.
To write a GST cell to RESET state, GST compound is heated above the
melting point and quenched rapidly. To write a GST cell to SET state, GST is
heated to a temperature between the crystallization and melting point, for a
period of time which is long enough to crystallize the GST.
Note: Chalcogenide is the same material utilized in re-writable optical media
(such as CD-RW and DVD-RW).
A 0.1um 1.8V 256 Mb 66MHz
Synchronous Burst PRAM
[2]
[1]
Multimedia ASIC Chips
 Due to rapidly Changing Standards and Technologies
 Adaptation to new standards is the key factor of Success
 The life span of each HW is shorter and shorter.
 It might be a feast this year but a famine next year.
 When a new processor released for new standard, it
makes huge profits.
 The product (old processor) is out of date and a new
processor for latest standards is still under development.
 Therefore, the flexible and Versatile hardware is required.
Example
SAA7215, SAA7216, SAA7221, SAA7214
by Philips semiconductors (QFB208):
It was announced in January 2001 and
it was discontinued in March 2002.
Solution: Configurable Video
Processors
Since the standards keep changing:
– The solution might be powerful core DSP together with Flexible
parts.
– Look for stable core and flexible components so that they can
survive during minor revisions of major revisions.
– The ideal goal is to survive during Digital Video revisions. (DVBS, DVT-T, DVB-S2, HDTV,…)
We should let consumer get access to any kind of peripheral
that is possible: Ethernet, USB, IDE, UART, IrDA,…
It should support as much as standards and stream that is
possible.
Philips: PNX8526
[3]
Continued..
It was designed in 0.12 um
technology.
TriMedia is a Philips internal
microprocessor core with a
proprietary architecture. It is a
VLIW with an instruction set
optimized for digital media
processing. One implementation is
the Philips-internal TM3270
synthesizable RTL core.
Nexperia is a product line of chips
based on a Trimedia processor
with specific-application targeted
peripherals. Nexperia chips have
part numbers beginning with PNX
[3]
Philips Chips
PNX8526: analog/digital television chip including a 266 MHz MIPS
CPU processor core and a 240 MHz TriMedia processor core
supporting demux and decoding of SDTV MPEG-2 Main profile and
Main level and HDTV MPEG-2 Main profile and High level, with
scaling and de-interlacing up to 1920x1080 resolution at 60
interlaced fields/second or 1368x720 resolution at 60 progressive
scan frames per second.
PNX010x: portable audio and multimedia player chip based on
ARM7 or ARM9 processors with a NAND flash memory and hard
drive interfaces.
PNX1500: media processor based on the TriMedia TM2360 VLIW
processor core running at 300 MHz with an LCD display controller
and ethernet interface.
Continued
PNX1700: with features similar to the PNX1500 but based on
the TriMedia TM5250 CPU core with software support for
H.264, MPEG-4 (SP, MVP, ASP), WMV9, DivX, and MPEG-2
with support for HDTV resolution decode of MPEG-2, WMV9,
and DivX (but not H.264).
PNX4103: software programmable mobile multimedia
processor, capable of H.264 (unspecified Profile) decode at
D1 (SDTV) resolution with stacked DRAM and support for
direct and RAM-buffered display interfaces.
PNX7100:DVD recorder chip with MPEG-2 encoding and
decoding for interlaced video includes a MIPS Technologies
133 MHz MIPS32 system controller processor core with
additional support for progressive scan video, fabbed in a
Philips 0.12 um process.
others
NEC:
– uPD61126: MPEG-2 decoder supporting multiple streams at
standard television resolution with noise filters and a range of
standard video interfaces based on 2 MIPS Technologies 4Kc
cores with enhanced security features for set-top boxes
BroadCom:
– BCM2722: Video Core II Multimedia Processor, used in the
Apple Video iPod, is capable of MPEG-4 video encode and
decode with design for low power consumption for battery
powered devices. The package contains a stacked 32 megabit
SDRAM, a USB 1.1 slave interface, a camera interface for up to
5M pixels, and an LCD controller interface among other
interfaces. The BCM2722 is manufactured in a 0.13um process
technology.
– BCM3560:
Low Power Multimedia Section of
ISSCC06
With the availability of increasing data bandwidth, there is a
greater demand for much more advanced multimedia
processing capabilities, which in turn translates to higher
computational and storage requirements on these devices.
Compounding this challenge is the ever increasing demand
for mobility, dictating that these multimedia functions be
performed at the lowest levels of power consumption.
The seven papers in this session focus on recent advances in
low power multimedia processing integrated circuits that
deliver advanced functionality, such as 3D graphics, high
resolution still and video encoding/decoding, and high fidelity
audio playback. Results from these papers demonstrate that
smart architecture design and implementation techniques, in
conjunction with advanced process technology, can deliver
very high performance multimedia functionalities at very low
power consumption levels.
6.33mW MPEG Audio Decoding
on a Multimedia Processor in 0.18u
Technology
Techniques to realize a Low Power
Multimedia:
– A parallel processing DSP for low voltage
operation
– Multi-Power Domain
– A conditional pre-charge FF.
Pipelining =>Low-frequency =>
Low-voltage
By making use of hardwired
functional blocks and parallel
and pipelined processing, the
required operating frequency
for MPEG decoding can be
lowered to 30MHz. As a result,
the voltage supply for MPEG
decoding can be reduced to
1.1V from 1.8 and 1.3V, which
is especially effective at
reducing the dynamic power
dissipation. The dynamic
power is reduced by 62.7%.
[4]
Multi-Bus Architecture
To obtain the high bandwidth data flow necessary for
multimedia signal processing, a multiple-bus architecture
is applied. The multiple-bus is comprised of one highspeed bus and 3 peripheral buses. The main bus
connects data transfer extensive blocks, such as the
hardwired dedicated DSP, memory card IF, USB2 PHY,
etc. )288 MB/s.) The peripheral buses connect serial
ports, timers, ADC, etc. )72 MB/s(. With this multi-bus
architecture, high-capacity data can be effectively
transferred without causing any conflicts with slow
data. External memories are connected via an external
memory controller.
A conditional pre-charge FF
In this circuit structure, the
clock signal (CLK) is gated by
the input signal (D, Db) so that
there are only a minimum
number of node changes even
if data changes as shown in
Fig. 22.7.3. With a
conventional flip-flop, a lot of
nodes change uniformly when
the clock signal is toggled, and
as a result, large power is
dissipated. Therefore the
proposed conditional
precharged flip-flop can reduce
the dynamic power dissipation
associated with the clock
signal compared with the
conventional flipflop.
[4]
A conditional pre-charge FF
the power dissipation of the
flip-flop consists of two parts:
1. the power dissipation
owing to the transition of
clock signal(CK)
2. the power dissipation
owing to transition of the
data signal (D)
[4]
A conditional pre-charge FF
[4]
Multi-Power Domain
This processor has a multi
power domain that is divided
into 6 parts. Each part is
connected to an individual
1.1V power supply that can be
turned off. For example, in the
case of AAC decoding, three
power domains are turned off.
[4]
Chip Micrograph
[4]
A 5mW MPEG4 SP Encoder with 2D
Bandwidth- Sharing Motion Estimation
for Mobile Applications
MPEG-4 codec designs [1-2] have been reported that
address the low power requirements demanded by
mobile devices.
Three sources consume most of the power in an MPEG4 encoder:
– Motion estimation (ME) consumes more than a half of the total
power, in general, because of its high memory access
requirements.
– Secondly, the discrete cosine transform/inverse discrete cosine
transform (DCT/IDCT) consumes power because of complex
computations.
– Data buffering between motion estimation/motion compensation
(ME/MC) and quantization/variable length code (Q/VLC)
consumes power because of the SRAM accesses.
System Architecture
At the module level, the design focuses on ME and DCT designs to
reduce power consumption. At the system level, the design reduces
the amount of data buffering between Q/VLC.
[5]
DCT Architecture
Most DCT coefficients become zero after quantization, so the
precision of these coefficients is less important. These can be
calculated with less precision to save power, and ideally little drop in
quality. A DCT design is adopted that depends on the content to
decide the required precision. It consumes less power for lowerprecision calculations reducing the total power consumption.
[5]
DCT & Zero Marker Scheme
A classifier circuit decides the allocation of calculation resources. It
is based on the value of the pixel-to-pixel amplitude (PPA) and the
quantization parameter (QP). After classification, the number of
calculation bits is decided. Both clock and combinational circuits are
shut down for any unused additional bits. The quality degradation
due to reduced precision is less than 0.1dB compared with a normal
DCT.
A zero marker scheme is adopted to reduce the data access of the
SRAM buffer between stages. The buffered data for VLC is
quantized, and they are mostly zero. For every four entities stored in
SRAM, a one bit register is used to record if they are all zeros. If this
occurs, no reading and writing is required. This mechanism avoids
most buffer accesses between the Q stage and VLC stage. It can
save 86% of data buffering in low bit rate and 62% in high bit rate
mode depending on the sequences
Characteristics
[5]
Die Micrograph
[5]
A 125μW, Fully Scalable MPEG-2
and H.264/AVC Video Decoder for
Mobile Applications
[6]
A 120Mvertices/s Multi-threaded
VLIW Vertex Processor for Mobile
Multimedia Applications
[7]
A 120Mvertices/s Multi-threaded
VLIW Vertex Processor for Mobile
Multimedia Applications
[7]
Summery
 PRAMs seems to be a promising field for non-volatile memories
 To survive in multimedia ASIC industry, we must give to consumer
some flexibility (configurability) and also some versatility.
 In ISSCC06, these techniques have been used for lowering the
power consumption while not violating the performance (in
Multimedia Section):







Multi-power domains
parallel processing for low voltage operations
Conditional pre-charge DFFs
Multi-threading
zero-marker scheme
precision-aware DCT/IDCT block
…
References
1.
S. Kang, et al, “A 0.1um 1.8V 256 Mb 66MHz Synchronous Burst PRAM”,
ISSCC2006
2.
H. R. Oh, et al, “Enhanced Write Performance of a 64Mb Phase-change Random
Access Memory”, ISSCC2005
3.
“PNX8526 Datasheet”, Philips Semiconductors
4.
Y. Ueda, et al, “6.33mW MPEG Audio Decoding on a Multimedia Processor”,
ISSCC2006
5.
C. P. Lin, et al, “A 5mW MPEG4 SP Encoder with 2D Bandwidth- Sharing Motion
Estimation for Mobile Applications”, ISSCC2006
6.
T. M. Llu, et al, “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder
for Mobile Applications”, ISSCC2006
7.
C. H. Yu, et al, “A 120Mvertices/s Multi-threaded VLIW Vertex Processor for
Mobile Multimedia Applications”, ISSCC2006
Thank You for Your Attention