ID_112C_Rootz_RX_Architecture - Renesas e
Download
Report
Transcript ID_112C_Rootz_RX_Architecture - Renesas e
ID 112C:MCU Architecture Evolution –
Now Better than Ever – So
who’s the Best?
Renesas Electronics America Inc.
Mark Rootz
Sr. Marketing Manager
12 October 2010
Version: 1.2
© 2010 Renesas Electronics America Inc. All rights reserved.
Mark Rootz
Renesas Sr. Marketing Manager, 32-bit MCUs
BSEE and MSEE from University of Missouri – Rolla
Seven years at STMicroelectronics
2
Definition and Promotion of 32-bit MCUs, N. America
Marketing Manager, STR9 32-bit ARM9 MCU line (France)
Product Marketing Manager, uPSD 8-bit 8051 MCU (San Jose CA)
Product definition, technical marketing, business mgt, infrastructure
Three years at Waferscale Inc
Applications Manager, uPSD MCUs
Tools, software, training, documentation, solutions, silicon validation
Three years at Hypertech Inc
Project Manager and engineering
Automotive powertrain controller software and hardware
Twelve years at McDonnell Aircraft (now Boeing)
Project Manager and engineering
F15/F18 fighter avionics systems engineering (weapons, radar, navigation)
Real-time simulation/test environment for complete avionics suite
Embedded MCUs, MPUs, PLDs software and hardware design
© 2010 Renesas Electronics America Inc.
All rights reserved.
Renesas Technology and Solution Portfolio
Microcontrollers
& Microprocessors
#1 Market share
worldwide *
ASIC, ASSP
& Memory
Advanced and
proven technologies
Solutions
for
Innovation
Analog and
Power Devices
#1 Market share
in low-voltage
MOSFET**
* MCU: 31% revenue
basis from Gartner
"Semiconductor
Applications Worldwide
Annual Market Share:
Database" 25
March 2010
** Power MOSFET: 17.1%
on unit basis from
Marketing Eye 2009
(17.1% on unit basis).
3
© 2010 Renesas Electronics America Inc.
All rights reserved.
Renesas Technology and Solution Portfolio
Microcontrollers
& Microprocessors
#1 Market share
worldwide *
Solutions
for
Innovation
ASIC, ASSP
& Memory
Advanced and
proven technologies
Analog and
Power Devices
#1 Market share
in low-voltage
MOSFET**
* MCU: 31% revenue
basis from Gartner
"Semiconductor
Applications Worldwide
Annual Market Share:
Database" 25
March 2010
** Power MOSFET: 17.1%
on unit basis from
Marketing Eye 2009
(17.1% on unit basis).
4
© 2010 Renesas Electronics America Inc.
All rights reserved.
Microcontroller and Microprocessor Line-up
Superscalar, MMU, Multimedia
High Performance CPU, Low Power
High Performance CPU, FPU, DSC
Up to 1200 DMIPS, 45, 65 & 90nm process
Video and audio processing on Linux
Server, Industrial & Automotive
Up to 500 DMIPS, 150 & 90nm process
600uA/MHz, 1.5 uA standby
Medical, Automotive & Industrial
Up to 165 DMIPS, 90nm process
500uA/MHz, 2.5 uA standby
Ethernet, CAN, USB, Motor Control, TFT Display
Legacy Cores
Next-generation migration to RX
General Purpose
Up to 10 DMIPS, 130nm process
350 uA/MHz, 1uA standby
Capacitive touch
5
© 2010 Renesas Electronics America Inc.
All rights reserved.
Ultra Low Power
Embedded Security
Up to 25 DMIPS, 150nm process Up to 25 DMIPS, 180, 90nm process
190 uA/MHz, 0.3uA standby
1mA/MHz, 100uA standby
Application-specific integration Crypto engine, Hardware security
RX: Performance without Sacrafice
Superscalar, MMU, Multimedia
High Performance CPU, Low Power
High Performance CPU, FPU, DSC
Up to 1200 DMIPS, 45, 65 & 90nm process
Video and audio processing on Linux
Server, Industrial & Automotive
Up to 500 DMIPS, 150 & 90nm process
600uA/MHz, 1.5 uA standby
Medical, Automotive & Industrial
Up to 165 DMIPS, 90nm process
500uA/MHz, 2.5 uA standby
Ethernet, CAN, USB, Motor Control, TFT Display
Legacy Cores
Next-generation migration to RX
Key Attributes
6
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX Innovation – Single Chip Enablement
Coldfire
CortexM3/M4
Kinetis
TMS320
ARM7/9
7
© 2010 Renesas Electronics America Inc.
All rights reserved.
There are many
32-bit MCU/DSP
Architectures
covering varied
capabilities
In a single Family
of devices, RX will
Encompass / Exceed
these Capabilities
PIC32
AVR32
RX Innovation – Single Chip Enablement
A single RX MCU can:
• Interpret a multitude of analog and
digital input sources
• Generate precision analog and
digital outputs in real time
8
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX Innovation – Single Chip Enablement
One MCU family for many applications
* Photos are examples of end-products that
could use an RX600 MCU. RX600 MCUs not
necessarily used in these products.
9
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX Microcontrollers … Best of the Best
RX MCUs were conceived and designed from the best CPU
architecture and technology available in the industry today
delivering the perfect blend of:
• CPU and Memory Performance
• Analog and DSP Capability
• Power and Memory Efficiency
• Scalability
• Connectivity
• System Cost
10
© 2010 Renesas Electronics America Inc.
All rights reserved.
“Best of the Best”
Agenda
Traditional Architectures
32-bit Choices
RX Architecture
Memory Speed vs. Performance
Comparing with Other 32-bit MCUs
Who’s the Best?
Q&A
11
© 2010 Renesas Electronics America Inc.
All rights reserved.
Key Takeaways
By the end of this session you will be able to:
Understand Key MCU Architectural Elements
Understand RX Architecture
Compare RX with Other Architectures
Make an Informed Decision
12
© 2010 Renesas Electronics America Inc.
All rights reserved.
MCU, DSP, Digital Signal Controller … What’s the Difference?
Traditional MCUs
Traditional DSPs
• Single-Chip Device
• Multi-Chip Solution
• Interrupt Management System
• Single-Task Oriented
• Fast Interrupt Response
• Slower Interrupt Response
• Efficient General Instructions
• Very Specific Instructions
• Fine Power Management
• High Power Consumption
• Wide Connectivity Choice
• Limited Connectivity Choice
• Rich Supervisory Functions
• Few Supervisory Functions
• Easily Programmed in C
• Complex Software
• Simple Low-Cost Tools
• More Expensive Special Tools
• Broad Ecosystem
• Simple Integer Math
DSC
Optimum Blend of
MCU and DSP
• Narrow Selection of 3rd Parties
• Hardware Multiply and Divide
• Saturating Math
• 1-Cycle, wide Multiply-Accumulate
• Barrel Shifters
• Simultaneous Code/Data Access
• Floating Point Unit
13
© 2010 Renesas Electronics America Inc.
All rights reserved.
The Evolved DSC, Many Practical Uses
More MCUs are gaining DSC Features
MCUs now have better analog capabilities
Signal processing is a must
Pushes bandwidth limits of traditional MCUs
DSC Applications
Motor Control
Digital Power Management
Audio Codecs
Medical Monitoring
Factory Automation
Even benefits traditional MCU applications
More work in less time
14
© 2010 Renesas Electronics America Inc.
All rights reserved.
16/32-bit MCUs and DSCs in the Market
MCUs
Core
Vendor
CPU
Width
(bits)
DMIPS/MHz Available
of CPU
Frequency
Core
(MHz)
Flash
Max Flash
Speed
Size (KB)
(MHz)
V850ES
Renesas
32
1.90
20 - 50
32
1024
ARM CortexM3
Various
32
1.257
60 - 150
<=502
1024
PIC326
Microchip
32
1.56
40 - 80
30
512
ARM7TDMI (Flash)
Various
32
0.957
24 - 60
<=308
1024
DSCs
Core
Vendor
SH-2A (Flash)
RX600
AVR329,10,11
ARM CortexM412,13
STR9 ARM966E14
TMS320 Delfino (Flash)15
TMS320 Piccolo16
56F8000/830017
dsPIC18
Renesas
Renesas
Atmel
Various
ST
TI
TI
Freescale
Microchip
1
Core is capable of, no released product yet
8
2
Based on existing CM3 and CM4 -based MCUs in mass production today
9
3
Optional FPU
4 MIPS, not DMIPS
5 MIPS, not DMIPS. 80MHz external clock yields 40MIPS
15
6
Microchip. PIC32MX3XX/4XX Family Data Sheet, DS61143E
7
ARM, “An Introduction to the ARM Cortex-M3 Processor”, Oct 2006
© 2010 Renesas Electronics America Inc.
All rights reserved.
CPU
Width
(bits)
32
32
32
32
32
32
32
16
16
FPU
DMIPS/MHz Available
Flash
Max Flash MAC (result
of CPU
Frequency Speed
(width
Size (KB) width bits)
Core
(MHz)
(MHz)
bits)
2.00
100 - 200
100
1024
32 and 64
64
1.65
80 - 100
100
2046
48 and 80
32
1.50
40 - 66
33
512
32, 48, and 64
1
2
1.25
150
<=50
1024
32 and 64
323
1.10
96
33
2048
32 and 64
n/a
100 - 150
27
512
64
32
n/a
40 - 60
25
128
64
4
1.00
32 - 60
No spec
512
36
5
0.50
60 - 80
No spec
256
40
-
Renesas 32-bit Flash MCU market assessment
Atmel, AVR32 brochure 7919F-AVR32-07/09/5K
10 Atmel, AVR32 Architecture Document 32000B-AVR32-11/07
11
15
TI, Data Manual, TMS320F283xx & TMS320F282xx DSCs, SPRS439H, March 2010
16
TI, Data Manual, TMS320F280xx MCus, SPRS584D, June 2010
17
Freescale, Data Sheet, 56F8323/56F8123 16-bit DSCs, MC56F8323 rev 17, May 2007
18
Microchip, Data Sheet, dsPIC33FJXXXMCX06A/X08A/X10A, 16-bit DSCs, DS70594B, 2009
Atmel, AT32UC3A datasheet 32058G-AVR32-01/09
12
ARM, CortexM4 Features Summary, www.arm.com
ARM, Cortex-M4 Technical Reference Manual r0p0
14 ST, STR91xFAxxx datasheet 13495 rev 6
13
CISC and RISC
Traditional CISC
Traditional RISC
Complex Instruction Set Computer
Reduced Instruction Set Computer
GOAL: Small Memory Footprint
GOAL: 1 Clock per Instruction
RX is Best of Both
• Any inst accesses memory
• Many rich instructions
• Many addressing modes
Mem-to-Mem instructions
73 Inst + DSP + FPU
10 addressing modes
• Only load/store mem access
• Few instructions
• Few addressing modes
• Variable instruction formats
1 to 8 byte instructions
• Fixed instruction formats
• Smaller code size in memory
Up to 28% smaller code
• Larger code size in memory
• Single register set
• Multi-clock instructions
• Less to no pipelining
• Longer interrupt response
16 x 32-bit registers
One clock per instruction
5-stage pipeline
5-clock interrupt response
• Multiple register sets
• Single-clock instructions
• Highly pipelined
• Faster interrupt response
Plus it has an FPU.
Let’s Build an RX…
16
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX Architecture … CPU Core and Pipeline
RX600 CISC CPU
16 x 32bit
General
Purpose
Registers
32bit
Floating
Point
Unit
Interrupt
Control
On-Chip
Debug
64bits
64bits
Typically
Flash Memory
64
Inst
RX Flash is
10 nsec, or
100 MHz
zero-wait
Holds 4 to 32
Instructions
for Slower
Memory
Memory Interface
RX SRAM is
also 10 nsec
WRITE BUFFER
Buffer
Only for
Writes
32bit path
Operand
(Data)
32
Data
For Slow Memory
5 STAGES OF PIPELINE
E = EXECUTE INSTRUCTION
M = READ OR WRITE MEMORY
W = WRITE BACK TO REGISTER
© 2010 Renesas Electronics America Inc.
All rights reserved.
TICK
ENHANCED HARVARD
ARCHITECTURE
F = FETCH INSTRUCTION
D = DECODE INSTRUCTION
17
TICK
TICK
TICK
TICK
TICK
F D E M W F D
F D E
E M W F
F D E M W
Achieves One
Clock-Per-Instruction (CPI)
16x16 or 32x32 MAC,
48bit or 80bit Result
32 x 32 DIV or MULT,
32bit or 64bit Result
TICK
TICK
64bit path
Instruction
F D E M W F D E M
F D E
E M W F D E
E
64bits
Memory
Protect
Unit
PRE-FETCH
QUEUE
(PFQ)
64bits
9 x 32bit
Control
Registers
TICK
5-STAGE PIPELINE
100MHz CPU Core
1.65 DMIPS/MHz
Typically
SRAM
RX Architecture … Memory Interface
RX600 MCU
64 bits
PIPELINE
64b INST
PFQ
RX600
CPU
100MHz
BUS MATRIX
BUFFER
64 bits
32b DATA
32 bits
Bus Master of Internal Main Bus 1
SRAM,
100MHz Access
External
Bus Pins
for CPU
32 bits
External
Bus
Controller
(BSC)
Internal Main Bus 1
32 bits
100 MHz Flash and SRAM means
zero wait-state code and data access
PFQ minimizes stalls from slower
memory, such as external memory
Bus master of Internal Bus 1 is the CPU
Next we look at Internal Bus 2…
Bus
Bridge
Peripherals
18
© 2010 Renesas Electronics America Inc.
All rights reserved.
Flash Memory,
100MHz Access
Another External Device
RX Architecture … System Interface
RX600 MCU
64 bits
PIPELINE
PFQ
64b INST
RX600
CPU
100MHz
BUS MATRIX
BUFFER
64 bits
32b DATA
32 bits
Bus Master of Internal Main Bus 1
SRAM,
100MHz Access
32 bits
External
Bus
Controller
(BSC)
32 bits
Internal Main Bus 1
32 bits
Internal Main Bus 2
Bus
Bridge
DTC
(bus master)
DMAC
(bus master)
CNTL
One External Device
19
Bus
Bridge
CNTL
EXDMA
(external bus master)
Ethernet DMAC
(bus master)
CNTL
External
Bus Pins
for CPU
Flash Memory,
100MHz Access
Multiple Peripheral Busses to Spread Bandwidth Loading
CNTL
System Control
Communication
Timers
Analog
(USB, CAN, SCI, SPI, I2C)
(MTU, TPU, TMR, CMT)
(DAC, ADC, PGA)
© 2010 Renesas Electronics America Inc.
All rights reserved.
GPIO
(DMA, E2P, ICU, LVD,
RTC, WDG, CLKS)
2K
FIFO
FIFO
2K
Ethernet MAC
RX CPU Core Performance
DMIPS per MHz
ARM7
ARM9
Cortex-M3
Cortex-M4
RX
1.0
1.65 DMIPS/MHz
1.5
Note: Dhrystone 2.1 numbers for ARM processors taken from www.arm.com
20
© 2010 Renesas Electronics America Inc.
All rights reserved.
Up to 43% Power Reduction
Milliwatts* per DMIPS
Low power modes
43% less
• 500mA* per MHz in Run Mode
• All Peripherals ON
1.0
2.0
• Four Low-Power Modes
• Sleep
= RX600
= A Cortex-M3 based MCU
• All-Module Stop
• Standby
• Deep Standby
Note: Derived from IDD specifications stated in product datasheets
•2.5mA* in Deep Standby
• RX63x, RTC ON
Low power design techniques
•
Clock gating
• Low power HVT transistors in slower paths
• Power gating
* Typical Conditions, 3.3V and 25oC, all peripheral clocks on
21
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX600 Instruction Set
= Single clock instruction
22
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX Instruction Set Summary and Size
Instruction
Length
(bytes)
1
1-3
1-4
2
2-3
2-4
2-5
2-6
2-8
3
BCnd
3
1
BRA
1
NOP, RTS, BRK
RMPA, ROLC, RORC, SAT, SATR, POP, POPC,
POPM, PUSHC, PUSHM, JMP, JSR, SCMPU,
SMOVB, SMOVF, SMOVU, SSTR, SUNTIL,
SWHILE, CLRPSW, RTE, RTFI, SETPSW, WAIT
24
ABS, NEG, NOT, SHAR, SHLL, SHLR, RTSD
7
3
4
5
1
15
MOVU, PUSH, BSR
SUB, BCLR, BSET, BTST
ADD, AND, CMP, MUL, OR
MOV
ROTL, ROTR, REVL, REVW, INT, MVFC, MACHI,
MACLO, MULHI, MULLO, MVFACHI, MVFACMI,
MVTACHI, MVTACLO, RACW
3-5
3-6
3-7
FTOI, ROUND, SCCnd, BMCnd, BNOT
4-6
4-7
ADC
SBB, ITOF, XCHG
DIV, DIVU, EMUL, EMULU, MAX, MIN, TST,
XOR, FADD, FCMP, FDIV, FMUL, FSUB, MVTC
STNZ, STZ
MOV instruction length is 2-8 bytes
23
Number of
Instructions
List of Instructions
© 2010 Renesas Electronics America Inc.
All rights reserved.
5
3
14
6% have
minimum
instruction length
of 1 byte
49% have
minimum
instruction length
of 2 bytes
42% have
minimum
instruction length
of 3 bytes
1
2
Total = 89 instructions
MOV instruction example
Function
Source
Destination
IMMREG
#IMM:32
Rd
opcode
#IMM:32
[Rd]
opcode
#IMM:16
[Rd]
opcode
#IMM:8
[Rd]
opcode
#IMM:8 Rd
#IMM:32
dsp:16[Rd]
opcode
dsp:16
REGREG
Rs
Rd
opcode Rd Rs
REGMEM
Rs
[Rd]
opcode Rd Rs
MEMREG
[Rs]
Rd
opcode Rd Rs
MEMMEM
[Rs]
[Rd]
opcode Rd Rs
IMMMEM
Direct Memory-to-Memory operation
24
© 2010 Renesas Electronics America Inc.
All rights reserved.
1
#IMM:32
#IMM:32
#IMM:16
Rd
Rd
Rd
#IMM:32
2
3
4
5
6
7
Instruction length (bytes)
Rd
8
Example: Moving data in memory
Traditional RISC
LDR r3, [r1]
2 bytes
STR r3, [r2]
2 bytes
RX
MOV [r1], [r2]
2 bytes
Number of Cycles = 4
Number of Cycles = 3
Code size = 4 bytes
Code size = 2 bytes
Direct Memory-to-Memory operation allows RX to avoid lengthy
load/store operations and results in smaller code size
25
© 2010 Renesas Electronics America Inc.
All rights reserved.
Up to 28% Code Size Reduction
Code size (relative)
28% less
Motor control
19% less
Data communication
17% less
Data conversion
25% less
Real-time control
25% less
System control
1.0
= RX600
= A Cortex-M3 based MCU
Note: Internal benchmark test, your results may vary
26
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX makes Out-of-Order Instruction Decisions
CPU Clock
Instructions
1) MOV [R1], R2
F
2) ADD R4, R5
Fetch
D
E
M
M
WB
F
D
S
S
E
WB
F
S
S
D
E
Decode
Execute
3) SUB R4, R5
WB
Instructions 2) and 3) delayed, waiting on 1)
Memory
Write Back
Stall
1) MOV [R1], R2
F
2) ADD R4, R5
3) SUB R4, R5
D
E
M
M
WB
F
D
S
S
E
WB
F
S
S
D
E
WB
Delay is Eliminated
• Is possible when there are no dependencies
• Multiple WB within same clock cycle OK if destination is different
27
© 2010 Renesas Electronics America Inc.
All rights reserved.
Interrupt Handling
IRQ
= Automatic by CPU = Done by Firmware
RX Normal Interrupt
Resolve
Interrupt
PC &
Optional
PSW
Push Gen
to
Regs
to Stack
Stack
ISR
Optional Pop POP PC &
Gen Regs
PSW from
Stack
from Stack
Return
6 clks
7clks typ.
RX Fast Interrupt
Resolve
Optional
Interrupt,
Push Gen
PC & PSW to
Regs
to Stack
Backup Regs
ISR
Optional Pop PC&PSW
from B/U
Gen Regs
Regs, Save 5 clocks
from Stack
Return
3 clks
5 clks typ.
RX Fast Interrupt plus Gen Register Usage
Resolve
Interrupt,
PC & PSW to
Backup Regs
5 clks typ.
ISR
Return
Save many clocks
3 clks
* ARM, Technical Reference Manuals: CortexM3 r1p1, CortexM4 r0p0
28
© 2010 Renesas Electronics America Inc.
All rights reserved.
General CPU
Registers
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
Interrupt Handling
IRQ
= Automatic by CPU = Done by Firmware
ARM Cortex M3 or M4*
Resolve Interrupt,
and Push CPU State
and 5 Regs to Stack
Pop CPU State and 5 regs
from Stack, and Return
ISR
12 clks
12 clks
RX Typical Interrupt
Resolve
Interrupt
PC &
Optional
PSW
Push Gen
to
Regs
to Stack
Stack
ISR
Optional Pop POP PC &
Gen Regs
PSW from
Stack
from Stack
Return
6 clks
7clks typ.
RX Fast Interrupt
Resolve
Optional
Interrupt,
Push Gen
PC & PSW to
Regs
to Stack
Backup Regs
ISR
Optional Pop PC&PSW
from B/U
Gen Regs
Regs,
from Stack
Return
3 clks
5 clks typ.
RX Fast Interrupt plus Gen Register Usage
Resolve
Interrupt,
PC & PSW to
Backup Regs
5 clks typ.
ISR
Return
Save up to 16 clocks
3 clks
* ARM, Technical Reference Manuals: CortexM3 r1p1, CortexM4 r0p0
29
© 2010 Renesas Electronics America Inc.
All rights reserved.
FPU directly accesses General Registers
Typical Operation
RX Operation
General Registers
General
Registers
No Load/Store
Instructions
Needed
FloatingPoint Unit
Load/Store
Dedicated
Data
Registers
Smaller code size
FloatingPoint Unit
30
© 2010 Renesas Electronics America Inc.
All rights reserved.
Higher FPU
performance
FPU Applications
Pump control
Digital filtering
Pressure regulator
Thermo couple conversion
Motor Control
Flow Control
31
©
Electronics
America
Inc. Inc. All rights reserved.
© 2010
2010Renesas
Renesas
Electronics
America
All rights reserved.
Motion Control
FPU benefits: Two examples
2- Thermocouple Conversion
Sensorless vector motor
control compiled for
Fixed Integer vs
Floating Point FPU
1- Motor Control
FPU provides the
best combined
execution time
and code size
FPU removes limitations
due to scaling or saturation
Improves accuracy for
motor position and speed
Increases motor efficiency
Easy code development
and maintenance. Write
formulas directly into C
code
Reduces CPU loading
Reduces code size
32
©
Electronics
America
Inc. Inc. All rights reserved.
© 2010
2010Renesas
Renesas
Electronics
America
All rights reserved.
Size in Byte
500
400
300
200
100
0
Look Up
Table
Fixed Point
Math
Software
Library
FPU
FPU Comparison
Example: Conversion of thermocouple reading to temperature
Thermocouple formula:
Temperature =
S (an * xn)
n = 0 ~ 5; a0 ~ a5 are constants; x is A/D reading
Execution
Time with
Ideal
Memory
(usec)
Code Size
(bytes)
MCU
Operating
Frequency
(MHZ)
CPU
Cycles
(count)
Actual
Execution
Time
(usec)
RX600
100
94
0.94
0.94
48
A CM3-based
MCU
72
1130
15.7
14.7
892
> 16x
Faster
> 18x
Smaller
The FPU provides a dramatic increase in performance and code efficiency
over math libraries.
• RX610 MCU: Renesas Compiler v0.02 Alpha, Size Max
• A CM3-based MCU: IAR Compiler v4.42A, Size Max
33
© 2010 Renesas Electronics America Inc.
All rights reserved.
DSP Arithmetic Functions
Repeated Multiply and Accumulate (RMPA)
Multiply-Accumulate unit
Memory
(ADC
Samples)
32-bit
Accumulate
80-bit
Memory
(coefficients)
32-bit
Multiply and Accumulate (MAC)
16-bit
Multiply-Accumulate unit
General register
General register
16-bit
34
© 2010 Renesas Electronics America Inc.
All rights reserved.
48-bit
Processing performance
Performance and Flash Speed
Competing MCU
with 30 MHz
Flash
30 MHz
IF
D
IF
E
D
M
E
WB
M
WB
© 2010 Renesas Electronics America Inc.
All rights reserved.
W
D
IF
DE
W
D
M
E
DE
WB
M
M
E
1 wait cycle
MCU
frequency
100 MHz
60 MHz
IF
no wait
35
RX with
100 MHz
Flash
WB
WB
M
IF
WB
W
D
IF
W
E
D
W
DM
E
W
WB
E
DM
M
WB
E
2 wait cycles
WB
M
WB
DSP and Benefit of 10nsec Flash
FIR Filter, RX600 and a CM3-based MCU
Completion Time, 100 iterations of FIR
Algorithm (usec)
5.000
A CM3 MCU Theorectical (73 CPU cycles per Iteration)
4.500
A CM3 MCU Actual w/ Memory Acceleration
• Performance loss due to
Flash slower than CPU
demand on a CM3 MCU
4.000
A CM3 MCU Actual w/o Memory Acceleration
3.500
RX600 Theorectical (46 CPU cycles per Iteration)
3.000
RX600 Actual
81 wait
Tap FIR Filter
2 wait
state
states
16 x 16 to 32-bit accumulate
2.500
2.000
1.500
1.000
• Mitigation effect of Memory
Acceleration on a CM3 MCU
Better, but
delay
remains
Lower
is
Better
RX has 63%
better
performance
0.500
0.000
16
24
32
40 48 56 64 72 80 88
MCU Operating Frequency (MHz)
• 8 Tap FIR Filter, 16 x 16 to 32bit accumulate
• RX610 MCU: Renesas compiler v1.0, Speed 2, macro used for RMPA
• A CM3-based MCU: IAR Compiler v5.40.0.315, Speed Max
36
• Theoretical performance with
“No-Wait Memory” for this
CM3 MCU
© 2010 Renesas Electronics America Inc.
All rights reserved.
96
100
• Theoretical performance with
“No-Wait Memory” for RX600
• Theoretical is Identical
to Actual performance for
RX600 because of 10nsec
Flash
Operating Frequency (MHz)
Flash-MCU History and Speed
MCU Freq.
Renesas Flash Freq.
General Flash Freq.
Renesas MONOS reaches
100MHz single cycle access
(40nm)
(0.18um)
100
(0.15um)
(90nm)
(0.35um)
(0.5um)
(0.8um)
10
Competitors
1990
1995
2000
2005
MONOS for EEPROM & IC-card
Source: Renesas
37
© 2010 Renesas Electronics America Inc.
All rights reserved.
Flash-MONOS
2010 Year
RX Family Roadmap
Max MHz
200
RX600
40 nm
100MHz+
RX600 Series
100
32 Bit, 90nm
Extreme High Performance
High Efficiency
50
H8SX
32 Bit
H8S
16 Bit
R32C
RX200 Series
M16C
32 Bit, 130 nm
High Performance
Low Power / Low Voltage
32 Bit
16 Bit
Existing MCUs
38
© 2010 Renesas Electronics America Inc.
All rights reserved.
Family
2010
2011
2012
RX600 System On A Chip
39
© 2010 Renesas Electronics America Inc.
All rights reserved.
RX600 Series Portfolio
LGA64
5x5mm
0.5mm
40
LQFP64
10x10mm
0.5mm
© 2010 Renesas Electronics America Inc.
All rights reserved.
LQFP80
14x14mm
0.65mm
LGA85
7x7mm0.
65mm
LQFP100
14x14mm
0.5mm
LQFP112
20x20mm
0.65mm
LQFP144
20x20mm
0.5mm
LGA145
9x9mm
0.65mm
BGA176
13x13m
m0.8mm
RX Migration Between Series
2MB
Flash
RX600 Series - 100Mhz Extreme Performance
1MB
Migration
Within RX
Family
512KB
Common CPU & Peripherals
384KB
256KB
128KB
64KB
32KB
RX200 Series - 50Mhz Low Power / Low Voltage
32
48 64 80/85
100 112
144/145
176
RX600: 500uA/MHz (all peripherals on), 2.5uA RTC Deep Standby, 2.7V to 3.6V
RX200: 200uA/MHz (all peripherals on), <1uA RTC Deep Standby, 1.62V to 3.6V
41
© 2010 Renesas Electronics America Inc.
All rights reserved.
Pins
RX Solutions
Motor Control, RX62T
Drive Sensorless PMAC Motor
Field Oriented Control, 3-phase
High integration, low system cost
See www.am.renesas.com/rx for details
Direct Drive TFT-LCD, RX62N
Drive 4.3” Color WQVGA TFT-LCD by RGB
Full basic graphic library and demo
Source code included
WiFi
Connectivity, RX62N RDK
42
Ethernet, USB Host/Device/USB, CAN
Many surrounding functions/features
Source code, built-in JTAG debugger
© 2010 Renesas Electronics America Inc.
All rights reserved.
802.11b/g/n WiFi, RX62N
Simple SPI connection to WiFi module
Kit contains driver and examples
Very low power 802.11b/g/n connectivity
RX Tools for Solutions
See www.am.renesas.com/rx for details
On-Chip Debug
Single Integrated Development & Debugging Environment
HEW4
Plus Renesas C/C++
$1200*
• JTAG and USB-HS
connection
• Program Flash
• Single step execution
• 256 Software break points
• 12 Hardware breakpoints
• PC and data breakpoints
• On-chip Trace
- 256 branches/cycles
• Read/Write SRAM
• Read/Write C variables
• Performance monitoring
• Non-intrusive
• Hot-plug capable
$99*
E1
Hi-Speed Trace
HEW4 also supports GNU-RX C/C++ compiler, all at $0
• JTAG, USB-HS, plus 6
lines connection
• Trace depth:
- 2M branches/cycles
• SRAM monitor, 4 KB
$995*
Wide 3rd Party Support for IDE, Compilers, Middleware, RTOS:
• Micrium, IAR, Segger, CMX, KPIT Cummings, freeRTOS, and more
E20
* Suggested resale price when sold individually
43
© 2010 Renesas Electronics America Inc.
All rights reserved.
Comparing other 32-bit CPU Architectures
RX600
CortexM31
CortexM42
AVR32A3
PIC324
Feature
Unit
CPU Type
-
Performance
DMIPS/MHz
1.65
1.25
1.25
1.50
1.50
Pipeline Length
Stages
5
3
3
3
5
Inst Lengths
Bytes
1 to 8
2 and 4
2 and 4
2 and 4
2 and 4
# of Instructions
For CPU,DSP
80, 9
97,3
97,83
115,8
129, 2
FPU
# of instructions
Yes, 8
No, 0
Option, 25
No, 0
No, 0
General Regs
# of regs, bits
15 x 32
12 x 32
12 x 32
13 x 32
27 x 32
Min Intr Latency
CPU Clocks
7 or 5
12 or 6
12 or 6
12 or 2
12 instructions
MPU
-
Option
Option
Option
Option
No
Bit Manipulation
-
Yes
Yes
Yes
Yes
Yes
Debug
Connection
JTAG or
2-wire
JTAG or
2-wire
JTAG or
2-wire
JTAG
JTAG
Hi-Speed Trace
Connection
6-wire
6-wire
6-wire
12-wire
4,8,or 16-wire
CISC, DSC RISC, MCU RISC, DSC RISC, DSC
RISC, MCU
References:
1
ARM, CortexM3 Technical Reference Manual Revision:r1p1, ARMv7-M Architecture Reference Manual DDI 0403C_errata_v3
ARM, CortexM4 Technical Reference Manual Revision:r0p0, ARMv7-M Architecture Reference Manual DDI 0403C_errata_v3
3
Atmel, AVR32C Technical Reference Manual 32002A-AVR32-03/07
4
Microchip, PIC32MX Family Reference Manual DS611271C. MIPS Technology, MIPS32 Architecture for Programmers Vol II: MIPS32 Instruction Set, rev 2.5, MIPS32 MK4 Processor Core
Datasheet, Rev 02.01
2
44
© 2010 Renesas Electronics America Inc.
All rights reserved.
Who’s the Best?
You Decide based on what you have seen.
To help your decision, here are publicly released benchmark
results based on widely acknowledged CoremarkTM from EEMBC.
Sorted by
CoreMark/MHz
*Vendor
*Processor
Type
*CPU
Freq
(MHz)
*CoreMark /
*CoreMark
MHz
*Compiler
Comment
Microchip
PIC32MX360F512L
MCU
30
2.599
78
GCC 4.3.2
Only 30 MHz operation
Microchip
PIC32MX360F512L
MCU
80
2.297
184
GCC 4.3.2
Negative effect of slow
Flash
Renesas
RX610
DSC
100
2.240
224
GNURX
201009
Full speed with no loss of
performance
TI
Stellaris LM3S9B96
CortexM3
MCU
50
1.921
96
Keil
V4.0.0.524
ST
STM32 CortexM3
120MHz. 90nm
MCU
120
1.905
229
KEIL
4.0.0.524
Microchip
PIC24HJ128GP202
MCU
40
1.862
74
GCC4.0.3
ST
STM32F103RB
CortexM3
MCU
24
1.797
43
GCC 4.4.1
NXP
LPC1768
MCU
100
1.753
175
ARMCC 4.0
TI
Stellaris LM3S9B96
CortexM3
MCU
80
1.596
127
Keil
V4.0.0.524
Negative effect of slow
Flash
ST
STM32F103RB
CortexM3
MCU
72
1.504
108
GCC 4.4.1
Negative effect of slow
Flash
Freescale
ColdFire MCF52233
MCU
60
1.038
62
IAR EW 1.20
Freescale
ColdFire MCF5274
MCU
150
0.773
115
GCC4.1.1
*Source: www.coremark.org as of 1 Sep 2010
45
© 2010 Renesas Electronics America Inc.
All rights reserved.
Has new “ART” memory
accelerator
Who’s the Best?
Now sorted by raw Coremark, not Coremark/MHz
Sorted by
CoreMark/MHz
*Vendor
*Processor
Type
*CPU
Freq
(MHz)
*CoreMark /
*CoreMark
MHz
*Compiler
Comment
ST
STM32 CortexM3
120MHz. 90nm
MCU
120
1.905
229
KEIL
4.0.0.524
Much Higher CPU freq
needed for same result
Renesas
RX610
DSC
100
2.240
224
GNURX
201009
Positive effect of efficient
CPU and fast Flash
Microchip
PIC32MX360F512L
MCU
80
2.297
184
GCC 4.3.2
NXP
LPC1768
MCU
100
1.753
175
ARMCC 4.0
TI
Stellaris LM3S9B96
CortexM3
MCU
80
1.596
127
Keil
V4.0.0.524
Freescale
ColdFire MCF5274
MCU
150
0.773
115
GCC4.1.1
ST
STM32F103RB
CortexM3
MCU
72
1.504
108
GCC 4.4.1
TI
Stellaris LM3S9B96
CortexM3
MCU
50
1.921
96
Keil
V4.0.0.524
Microchip
PIC32MX360F512L
MCU
30
2.599
78
GCC 4.3.2
Microchip
PIC24HJ128GP202
MCU
40
1.862
74
GCC4.0.3
Freescale
ColdFire MCF52233
MCU
60
1.038
62
IAR EW 1.20
ST
STM32F103RB
CortexM3
MCU
24
1.797
43
GCC 4.4.1
*Source: www.coremark.org as of 1 Sep 2010
46
© 2010 Renesas Electronics America Inc.
All rights reserved.
Questions
1: What is the read access time of RX600 Flash Memory?
10 nsec (100MHz) across entire voltage range 2.7V to 3.6V
2: How many DMIPS/MHz does RX600 produce, and how many mW/DMIP does it
consume?
1.65 DMIPS/MHz, and 1mW/DMIPS
3: What does the RMPA instruction do?
Repeat Multiply Accumulate. One instruction automatically multiplies data from
two different memory arrays, and adds result to 80-bit accumulator, then
post-increments to next two values. Repeats until specified array length is
met. DSP!!
47
© 2010 Renesas Electronics America Inc.
All rights reserved.
See www.am.renesas.com/rx for details
Innovation – Single Chip Enablement
One MCU Family for many applications
48
© 2010 Renesas Electronics America Inc.
All rights reserved.
www.am.renesas.com/rx
Thank You!
49
© 2010 Renesas Electronics America Inc.
All rights reserved.
Renesas Electronics America Inc.