Configurable Processors: A New Age of (Not So Hard)

Download Report

Transcript Configurable Processors: A New Age of (Not So Hard)

Advanced Processor Architectures for
Embedded Systems
Witawas Srisa-an
CSCE 496: Embedded Systems Design and
Implementation
Objectives
• Discuss ASIC, FPGA-based systems, and
general purpose processors
• Analyze the operating requirements for today’s
embedded processors
• Observe the architectural differences between
state-of-the-art processors for embedded
systems and high-performance general purpose
processors
– Tensilica Xtensa
– Stretch S5000
Embedded Processors
Requirements
•
•
•
•
operate in memory constraint environment
must be energy efficient
must be low cost
may have to be good at a common set of
tasks
– matrix multiplication,
– encryption,
– filtering (FIR),
– network packet processing, etc.
Implications
• low memory footprint
– simplified instruction set
• 16-bit, 24-bit
– may not need support for VM
• may lack hardware MMUs
• energy efficient
–
–
–
–
–
–
less complex (smaller number of transistors)
simple pipeline stages
less cache memory on chips
simple floating point units
larger transistors and slower clocks
integrated function specific components for common
tasks
Implications (cont.)
• low cost
– share IP cores to reduce development cost
• ARM, MIPS, etc.
– use older semiconductor process technologies (e.g.
250nm instead of 90 nm)
• task specific
–
–
–
–
built in DSP unit
wide data bus (more data per movement)
may need support for adding functions to the cores
may need field-reconfigurability
Rationales
from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto,
Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160
Rationales (cont.)
from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto,
Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160
Rationales (cont.)
“Studies have shown that custom hardware
components often require much less energy to
complete their tasks than the same tasks
running on general purpose processors.” [1]
“An ASIC is custom logic for a particular
application. Custom logic can be orders of
magnitude more efficient than microprocessorbased solutions.” [2]
[1] Lach et al., “Power-Efficient Adaptable Wireless Sensor Networks”, Proceedings of International Conference
on Military and Aerospace Programmable Logic Devices (MAPLD), September 2003.
[2] Tredennick and Shimamoto, “The Death of Micro-Processors”, Embedded Systems Programming,
http://www.embedded.com/showArticle.jhtml?articleID=26807160
Application Specific ICs (ASICs)
• provide custom design solutions for
particular problems
– fixed solutions that require public acceptance
to reduce cost
– required extensive knowledge of hardware
design
– not field-reconfigurable
– can have large non-recurring engineering
(NRE) cost
ASICs (cont.)
Technology
Mask cost
90 nm
$1,000,000
180 nm
$250,000
250 nm
$120,000
350 nm
$60,000
Wayne Wolf, FPGA-Based System Designs, Prentice Hall, 2004
FPGA Based Systems
• Field-programmable gate arrays (FPGAs)
– are slower and require more power than
custom design
– are more expensive
– but provide no wait time from completing a
design to making a chip
• great for prototyping
– are also reusable
FPGAs
• SRAM based--volatile
– Altera Flex, Stratix, Cyclone, Apex
• Antifuse--one-time programmable
– Actel
• EEPROM--non-volatile
– Altera Max
ASIC Design Approaches
• Custom VLSI designs
– are fabricated on manufacturing line
• takes months
• masking cost is also expensive
– operate much faster and consume less power
than FPGA equivalents
– can be cheaper of manufactured in large
volume
ASIC Design Approaches
(cont.)
• Structured ASIC
– is based on pre-designed logic fabric
structurally embedded in the platform
– fill the market gap between high-density
FPGAs and standard cell ASICs
• can greatly reduce development time and cost
• reduce non-recurring engineering (NRE) cost
http://www.amis.com/asics/structured_asics/
http://www.altera.com/b/hardcopyii.html?WT.mc_id=h2_sm_go_xx_tx_2_041&WT.srch=1
Structured ASICs
View Altera demo
Integrating ASICs with GPPs
• Today’s embedded systems have can
have complex software layers
– OS
– Virtual Machine
– Applications
• It is more ideal to mate GPPs with ASICs
as co-processors
Integrating ASICs with GPPs
(cont.)
• So, we can have GPPs to perform basic tasks
and ASICs (co-processors) to speed up
computing intensive functions
– sounds simple but in reality, it is quite complex
– basic hand-shaking is needed between the ASICs
and the main processors
• data exchange
– shared memory
– requires OS and architecture support
• synchronous or asynchronous calls
• cache coherency issue
ASICs and GPPs (cont.)
• An example is to use hardware coprocessor for Cryptography
– should the co-processor calls be synchronous
• main processor blocked on calls and wait for
response
– or asynchronous
• calling process blocked and swapped out
• need interrupt support
• need to maintain context
ASICs and GPPs (cont.)
• Co-processor
– shares bus with the main CPU
• is a source for bus contention
– can cause cache coherency issue
• data in the main CPU cache may have been
updated by the co-processor
– flush the cache accordingly
– should be equiped with DMA to relieve the
main CPU from copying data
Extending GPPs
• Tensilica Xtensa
– reconfigurable processor cores
•
•
•
•
support native 16-bit and 24-bit instruction for higher code density
users can add/subtract components (MMU, Multipliers, FPUs)
users can reconfigure cache organization
users can select bus width (32, 64, or 128 bits)
– users defined instruction extension language
• users can create custom instructions to speed up commonly used
functions
• users can instantiate custom registers of different sizes
Tensilica Xtensa
from http://www.tensilica.com/html/tensilica_instruction_extensio.html
Tensilica Xtensa (cont.)
• We will not go into great detail about the
Xtensa.
• However, we will study Stretch S5000
engine which is based on the Xtensa core.
Design Time Solutions
• Up to now, we have only talked about designtime solutions!
– logic designs are done in house
– not very reconfigurable after the chip is made
– even with FPGAs, someone has to come up with a
new hardware design for it to change
– the Xtensa needs about 1 hours to synthesize the
instruction extension
• What if we want to configure on the fly!
– each application brings in CPU intensive functions
• these functions are not known in advance
– Can we leave it up to the software developers to
design fast co-processor?
Run-Time Configuration
(R)evolution of Processors
Ice Hard
Rock Hard
Playdough
Hard
(R)evolution of Processors
Ice Hard
Hardwire, GPP
Perform well in most conditions
but not extreme conditions
Rock Hard
Playdough
Hard
(R)evolution of Processors
Ice Hard
GPP with FPGAs
Custom designs perform well
in some extreme conditions.
Required extensive knowledge
Of hardware design
Rock Hard
Play Dough
Hard
(R)evolution of Processors
Ice Hard
Rock Hard
GPP with embedded
programmable logics
Reconfiguration triggered
by software
Playdough
Hard
(R)evolution of Processors
• Ice Hard
– Contains ASIC
(Application Specific
IC) designs
• Increases time-tomarket
• Takes time to
reconfigure
Software Hotspots
• In DSP
– 80% of the processing load are spent on 20%
of the code
• Hand tuned assembly that can take thousands of
cycle to execute.
• Less portable
– The remaining 80% of the code have complex
system functions
• Run well on most GPP
Software Hotspots Example
• when 16 QuadAM modem (19.2 Kbaud) implemented
entirely in software
– takes 177,000 instruction cycles to execute on
TIC6711
FPGA Co-processor (a few cycles)
Solving Hotspots
PROCESSOR + FPGA
P
MULTIPLE DSPs
P
FPGA
P
RISC
PROCESSOR
DSP ENABLED
PROCESSORS
P
P
P
PROGRAMMABLE
LOGIC
Solving Hotspots
PERFORMANCE
ASIC
SCP
FPGA
DSP
CPU
FLEXIBILITY & TTM
SCP = Software Configurable Processor