architecture
Download
Report
Transcript architecture
Embedded Systems Architecture
Introduction
Sensors
Actuators
A/D and D/A Converters
Communication
Processing Units
Conclusion
CDA 4630/5636 – Spring 2017
Copyright © 2017 Prabhat Mishra
1
Components of Embedded Systems
Display
Analog Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital Analog
Converter
Actuators
Sensors
Environment
2
Sensors
Processing of physical data starts with
capturing this data.
Sensors can be designed for virtually every
physical stimulus
heat, light, sound, weight, velocity, acceleration,
electrical current, voltage, pressure, ...
Many physical effects used for constructing
sensors.
law of induction (generation of voltages in an
electric field),
light-electric effects.
Artificial Leg
4
Artificial Hand
5
Prosthetic Hand
6
Prosthetic Hand with Sensors
Identity Protected
7
Artificial Eyes
© Dobelle Institute
(www.dobelle.com)
Charge-Coupled Devices (CCD)
Image Sensors: Based on charge transfer to next pixel cell
CMOS Image Sensors
Based on standard production process for CMOS
chips, allows integration with other components.
Source: B. Diericks: CMOS image sensor
concepts. Photonics West 2000 Short course
Comparison CCD/CMOS sensors
Biometrical Sensors
Example: Fingerprint sensor (© Siemens, VDE):
Matrix of 256 x
256 elem.
Voltage ~
distance.
Resistance
also
computed. No
fooling by
photos and
wax copies.
Carbon dust?
Integrated into ID mouse.
Components of Embedded Systems
Display
Analog Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital Analog
Converter
Actuators
Sensors
Environment
13
Discretization of Time
Sampling: how often the signal is converted?
Quantization: how many bits used for sampling?
Aliasing
Signal frequency: 5.6 Hz
Sampling frequency: 9 Hz
1.5
1
0.5
0
-0.5
-1
-1.5
Analog to Digital Conversion
Sampling: how often is the signal converted?
Twice
as high as the highest frequency signal
present in the input
Quantization: how many bits used to
represent a sample?
Sufficient to provide required dynamic range
16-bit A/D 20log10(216) = 96 dB (human ear limit)
Under-loading: dynamic range not used properly
Clipping: input signal beyond the dynamic range
Aliasing: erroneous signals, not present in
analog domain, but present in digital domain
Use anti-aliasing filters
Sample at higher than necessary
rate
Analog-to-Digital Converter
3.0V
2.5V
2.0V
1.5V
1.0V
0.5V
0V
proportionality
4
4
analog output (V)
5.0V
4.5V
4.0V
3.5V
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
analog input (V)
Vmax = 7.5V
7.0V
6.5V
6.0V
5.5V
3
2
1
t1
0100
t2
t3
time
t4
1000 0110 0101
Digital output
analog to digital
3
2
1
t1
t2
0100
t3
1000 0110
Digital input
digital to analog
t4
time
0101
Flash A/D Converter
Parallel comparison
with reference voltage
Speed:
O(1)
HW complexity: O(n)
n= # of distinguished
voltage levels
Successive Approximation
Key idea: binary search:
Set MSB='1'
if too large: reset MSB
Set MSB-1='1'
if too large: reset MSB-1
…..
Speed:
O(log(n))
Hardware complexity:
O(log(n))
with n= # of distinguished
voltage levels;
slow, but high precision possible.
Successive Approximation
Given an analog input signal whose voltage should range
from 0 to 15 volts, and an 8-bit digital encoding, calculate
the correct encoding for 5 volts.
½(Vmax – Vmin) = 7.5 volts
Vmax = 7.5 volts.
0
0
0
0
0
0
0
0
½(5.63 + 4.69) = 5.16 volts
Vmax = 5.16 volts.
0
1
0
1
0
0
0
0
½(7.5 + 0) = 3.75 volts
Vmin = 3.75 volts.
0
1
0
0
0
0
0
0
½(5.16 + 4.69) = 4.93 volts
Vmin = 4.93 volts.
0
1
0
1
0
1
0
0
½(7.5 + 3.75) = 5.63 volts
Vmax = 5.63 volts
0
1
0
0
0
0
0
0
½(5.16 + 4.93) = 5.05 volts
Vmax = 5.05 volts.
0
1
0
1
0
1
0
0
½(5.63 + 3.75) = 4.69 volts
Vmin = 4.69 volts.
0
1
0
1
0
0
0
0
½(5.05 + 4.93) = 4.99 volts
0
1
0
1
0
1
0
1
Components of Embedded Systems
Display
Analog Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital Analog
Converter
Actuators
Sensors
Environment
21
Digital-to-Analog (D/A) Converters
Various types, can be quite
simple, e.g.:
Output voltage no. represented by x
Due to Kirchhoff‘s laws: I x3
Vref
Vref
R
x2
Vref
2R
x1
Vref
4R
x0
Vref
8 R
3
xi 2i 3
R i 0
Due to Kirchhoff‘s laws: V R I ' 0
1
Current into Op-Amp=0: I I '
Hence:
Finally:
V R1 I 0
V Vref
R1 3
R1
i 3
xi 2 Vref
nat ( x )
R i 0
8 R
Components of Embedded Systems
Display
Analog Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital Analog
Converter
Actuators
Sensors
Environment
24
Stepper Motor Controller
Stepper motor: rotates fixed number of
degrees when given a “step” signal
In
contrast, DC motor just rotates when power
applied.
Rotation achieved by applying specific voltage
sequence to coils
Controller greatly simplifies this
Stepper Motor Controller
Sequence
1
2
3
4
5
A
+
+
+
B
+
+
+
A’ B’
+ + +
- +
-
sbit SM_A, SM_B, SM_AP, SM_BP; // ports
int curr_pos; // tells us the current step position
void reset() { // must be called to synchronize
current_pos = 0;
for(int i=0; i<4; i++) {
move_one_step(0);
}
}
void move_one_step(int dir/*0=CW,1=CCW*/) {
const int SM_TBL[4][4] = {
1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0 };
cur_pos = (curr_pos + (dir == 0 ? +1 : +3)) % 4;
SM_A = SM_TBL[curr_pos][0];
SM_B = SM_TBL[curr_pos][1];
SM_AP = SM_TBL[curr_pos][2];
SM_BP = SM_TBL[curr_pos][3];
ms_delay(50);
}
Actuators
Huge variety of actuators and output devices.
Microsystems motors as examples (© MCNC):
(© MCNC)
Micro-array of Mirrors
28
Components of Embedded Systems
Display
Analog Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital Analog
Converter
Actuators
Sensors
Environment
29
LCD
Liquid Crystal Display
N rows by M columns
Controller build into the
LCD module
Simple microprocessor
interface using ports
Software controlled
Components of Embedded Systems
Display
Analog Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital Analog
Converter
Actuators
Sensors
Environment
31
Communication: Hierarchy
Inverse relation between volume and
urgency quite common:
Sensor/actuator busses
Communication: Requirements
Real-time behavior
Efficient, economical
Bandwidth and communication delay
Robustness
Fault tolerance
Maintainability
Diagnosability
Security
Basic Techniques: Electrical Robustness
Single-ended vs. differential signals
ground
Voltage at input of Op-Amp positive '1'; otherwise '0'
Local ground
Local ground
Combined with twisted pairs; Most noise added to both wires.
Evaluation (Twisted Pairs)
Advantages:
Subtraction removes most of the noise
Changes of voltage levels have no effect
Reduced importance of ground wiring
Higher speed
Disadvantages:
Requires negative
Increased number
voltages
of wires and connectors
Applications:
USB, FireWire, ISDN
Ethernet (STP/UTP CAT 5 cables)
Differential SCSI
High-quality analog audio signals
Real-time behavior
Carrier-sense multiple-access/collisiondetection (CSMA/CD, Standard Ethernet) no
guaranteed response time.
Alternatives:
Token
rings, token busses
Carrier-sense multiple-access/collision-avoidance
(CSMA/CA)
WLAN techniques with request preceeding transmission
Each partner gets an ID (priority). After each bus transfer, all
partners try setting their ID on the bus; partners detecting higher ID
disconnect themselves from the bus. Highest priority partner gets
guaranteed response time; others only if they are given a chance.
Example1: Sensor/Actuator Bus
Real-time behavior is very important
Different techniques:
Many wires
less wires
CNC: Computerized Numerical Control
expensive & flexible
Example2: Field bus
More powerful/expensive than sensor
interfaces; serial busses preferred.
Examples:
Process Field Bus (Profibus)
http://www.profibus.com
Token passing;
9.6 kbit/s (1200 m) to 500 kbits/s (200m);
too slow to be used for hard time constraints.
Field Buses
Controller area network (CAN)
Designed by Bosch and Intel in 1981;
Used in cars and other equipment;
Differential signaling with twisted pairs,
Arbitration using CSMA/CA,
Throughput between 10kbit/s and 1 Mbit/s,
Low and high-priority signals,
Max. latency of 134 µs for high priority signals,
Coding similar to that of serial (RS-232) lines of
PCs, with modifications for differential signaling.
http://www.can.bosch.com
Field Buses
The Time-Triggered-Protocol (TTP) [Kopetz et al.]
for fault-tolerant safety systems like airbags in cars.
FlexRay: TDMA (Time Division Multiple Access) protocol,
developed by the FlexRay consortium (BMW, Ford, Bosch,
DaimlerChrysler, General Motors, Motorola, Philips).
Combination of a variant of the TTP and the byteflight
[Byteflight Consortium, 2003] protocol.
Designed to meet key automotive requirements
Complements the major in-vehicle networking standards
A high data rate can be achieved: initially targeted for a
data rate of approximately 10Mbit/sec; however, the
design of the protocol allows much higher data rates to be
achieved.
Example3: Wireless Communication
Wireless Communication
IEEE 802.11 a/b/g
UMTS (Universal Mobile Telecommunications System)
Bandwidth is becoming a scarce resource.
DECT (Digital Enhanced Cordless Telecommunications)
Standard used for wireless phones in Europe
Bluetooth
Connect devices e.g., mobile phone and headset
Components of Embedded Systems
Display
Analog Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital Analog
Converter
Actuators
Sensors
Environment
43
Global Energy Consumption
Quadrillion British thermal units (Btu)
OECD - Organization for Economic Co-operation and Development
Source: U.S. Energy Information Administration, International Energy Outlook 2011. 44
Global Greenhouse Gas Emissions
45
Global Greenhouse Gas Emissions
Improved efficiency with Information
Technology can reduce 29% in green
house gases for transport, industrial,
residential and commercial sectors.
2009 U.S. Greenhouse Gas Inventory Report
46
Power Consumption in Data Centers
In 2006, datacenters used 1.5% (60 billion KW-hr/year) of all the electricity
produced in the US … if nothing significant is done about the situation,
this consumption will rise to 2.9% by 2011.
47
Report to Congress on Server and Data Center Energy Efficiency, EPA 2007.
The Energy/Flexibility Conflict
Operations/Watt
[MOps/mW]
10
1
DSP-ASIPs
µPs
0.1
poor design
generation
techniques
0.01
1.0µ
0.5µ
0.25µ
0.13µ
0.07µ
Technology
Power and Energy
P
E P dt
E
t
In many cases, faster execution also means less
energy, but the opposite may be true if power has to
be increased to allow faster execution.
Power and Energy
Power is drawn from a voltage source
Power:
Energy:
P(t ) iDD (t )VDD
T
T
0
0
E P(t )dt iDD (t )VDDdt
T
Average Power:
E 1
Pavg iDD (t )VDDdt
T T0
Dynamic Power
Power needed to charge and discharge load
capacitances when transistors switch.
The capacitor needs to charge for output to be ‘1’
For output to be ‘0’, capacitor needs to discharge
This repeats T.fsw times over an interval of T
T
Pdynamic
1
iDD (t )VDD dt
T 0
VDD
iDD(t)
T
VDD
iDD (t )dt
T 0
fsw
VDD
TfswCVDD
T
2
2
P
CV
CVDD f sw
dynamic
DD f
C
Here, is activity factor
and f is clock frequency.
Low Power vs. Low Energy
Minimize the power consumption
Design of the power supply
Design of voltage regulators
Dimensioning of interconnect
Short term cooling
Minimizing the energy consumption
Restricted availability of energy (mobile systems)
Limited battery capacities (only slowly improving)
Very high costs of energy (solar panels, in space)
Cooling
High costs
Limited space
Dependability
Long lifetimes, low temperatures
Information Processing
ASIC
Processor
Energy efficiency
Code-size efficiency
Run-time efficiency
Special features of DSP processors
Multimedia instructions
Very Long Instruction Word (VLIW) machines
Reconfigurable Hardware
Memory
Application Specific Circuits (ASIC)
Custom-designed circuits
necessary if ultimate speed
or energy efficiency is the
goal and large numbers can
be sold.
Approach suffers from long
design times and high costs.
Information Processing
ASIC
Processor
Energy efficiency
Code-size efficiency
Run-time efficiency
Special features of DSP processors
Multimedia instructions
Very Long Instruction Word (VLIW) machines
Reconfigurable Hardware
Memory
Reducing Energy Consumption
Pentium
Crusoe
Running the same multimedia application. [www.transmeta.com]
Infrared Cameras (FLIR) can be used to detect thermal distribution.
Dynamic Power Management (DPM)
RUN: operational
IDLE: a SW routine may stop the CPU when
not in use, while monitoring interrupts
SLEEP: Shutdown of on-chip activity
400mW
RUN
10µs
90µs
160ms
STRONGARM
SA1100
10µs
90µs
IDLE
SLEEP
50mW
160µW
Dynamic Voltage Scaling (DVS)
E = P x T
P V2
E (energy), P (power), T (time), V (voltage)
Example
A task is given with workload (W) and deadline
(D). Assume that idle energy is negligible.
E1 V12.T1 = V2.T
E2 V22.T2 = V2/4.2T
= E1/2
V
V/2
T
D
T
2T D
58
Dynamic Voltage Scaling
59
Information Processing
ASIC
Processor
Energy efficiency
Code-size efficiency
Run-time efficiency
Special features of DSP processors
Multimedia instructions
Very Long Instruction Word (VLIW) machines
Reconfigurable Hardware
Memory
Code Size Efficiency
RISC machines designed for run-time, not for
code-size-efficiency
Compression techniques: key idea
Code-size Efficiency
Compression techniques (continued):
2nd
instruction set e.g., ARM Thumb instruction set
001 10
major
opcode
Rd
Constant
16-bit Thumb instr.
ADD Rd #constant
minor source=
opcode destination
1110 001 01001
0 Rd
zero extended
0 Rd 0000 Constant
• Reduction to 65-70 % of original code size
• 130% of ARM performance with 8/16 bit memory
• 85% of ARM performance with 32-bit memory
Domain-oriented Architectures
n-1
Application: y[j] = i=0 x[j-i]*a[i]
i: 0i n-1: yi[j] = yi-1[j] + x[j-i]*a[i]
Architecture: ADSP210x (analog.com)
P
D
AX
Addressregisters
A0, A1, A2
..
i+1, j-i+1
Address
generation
unit (AGU)
x
a
x[j-i]
AY
MX
AF
- Parallelism
- Dedicated
registers
MY
a[i]
MF
MR:=0; A1:=1; A2:=n-2;
MX:=x[n-1]; MY:=a[0];
for ( j:=1 to n) {
MR:=MR+MX*MY;
+,-,..
AR
MY:=a[A1];
* x[j-i]*a[i]
+,yi-1[j]
MR
MX:=x[A2];
A1++; A2—
}
Information Processing
ASIC
Processor
Energy efficiency
Code-size efficiency
Run-time efficiency
Special features of DSP processors
Multimedia instructions
Very Long Instruction Word (VLIW) machines
Reconfigurable Hardware
Memory
Digital Signal Processing (DSP)
Multiply/accumulate (MAC) and zerooverhead loop (ZOL) instructions
Heterogeneous registers
Separate address generation units (AGUs)
Digital Signal Processing (DSP)
Modulo addressing
sliding window
Am++ Am:=(Am+1) mod n
x
(implements ring or circular
buffer in memory)
t1
t2
t
..
x[n-2]
x[n-1]
x[0]
x[1]
..
Memory, t=t1
..
x[n-3]
x[n-2]
x[n-1]
x[n]
x[1]
Memory, t2=t1+1
Multimedia Instructions
Many registers, adders etc. are very wide
32 or 64 bits
Most multimedia data types are narrow
e.g., 8 bits per color, 16 bit per audio sample
2 - 8 values can be stored per register and added.
+
4 additions per
instruction; carry disabled
at word boundaries.
HP precision architecture (hp PA)
Half word add instruction HADD:
Half word add?
Optional saturating arithmetic.
Up to 10 instructions can be replaced by HADD.
Application
Scaled
interpolation
between two
images
Next word =
next pixel,
same color.
4 pixels
processed at
a time.
Pentium MMX Architecture
64-bit vectors represent 8 bytes, 4 words or 2
double word encoded numbers.
wrap around/saturating options.
Multimedia registers mm0 - mm7, consistent with
floating-point registers (OS unchanged).
Instruction
Options
Comments
Padd[b/w/d]
PSub[b/w/d]
wrap around, addition/subtraction of
saturating
bytes, words, double words
Pcmpeq[b/w/d]
Pcmpgt[b/w/d]
Result= "11..11" if true, "00..00" otherwise
Result= "11..11" if true, "00..00" otherwise
Pmullw
Pmulhw
multiplication, 4*16 bits, least significant word
multiplication, 4*16 bits, most significant word
Pentium MMX Architecture
Psra[w/d]
Psll[w/d/q]
Psrl[w/d/q]
No. of
positions in
register or
instruction
Punpckl[bw/wd/dq]
Punpckh[bw/wd/dq]
Packss[wb/dw]
Parallel shift of words, double words
or 64 bit quad words
Parallel unpack
Parallel unpack
saturating
Parallel pack
Pand, Pandn
Por, Pxor
Logical operations on 64 bit words
Mov[d/q]
Move instruction
VLIW Processors
VLIW: Very Long Instruction Word
Detection of parallelism is done by compiler,
not by hardware at run-time (inefficient).
Parallel operations (instructions) encoded in
one long word (instruction packet), each
instruction controlling one functional unit.
Partitioned Register Files
Many memory ports are required to supply
enough operands per cycle.
Memories with many ports are expensive.
Registers are partitioned into sets, e.g. for TI
C60x:
Data path A
Data path B
register file A
L1
S1
register file B
M1
D1
D2
Address bus
Data bus
M2
S2
L2
Microcontrollers: MHS 80C51
8-bit CPU optimised for control applications
Extensive Boolean processing capabilities
64 k Program Memory address space
64 k Data Memory address space
4 k bytes of on chip Program Memory
128 bytes of on chip data RAM
32 bi-directional and indiv. addressable I/O lines
Two 16-bit timers/counters
Full duplex UART
6 sources/5-vector interrupts with 2 priority levels
On chip clock oscillators
Very popular CPU with many different variations
Information Processing
ASIC
Processor
Energy efficiency
Code-size efficiency
Run-time efficiency
Special features of DSP processors
Multimedia instructions
Very Long Instruction Word (VLIW) machines
Reconfigurable Hardware
Memory
Reconfigurable Logic
Full custom chips may be too expensive,
software may be too slow.
Use of configurable hardware
e.g., field programmable gate arrays (FPGAs)
Application areas
Fast prototyping
configuring mobile phone according to local standards
Low volume applications
Example: Xilinx Virtex II FPGAs
Floorplan of VIRTEX II FPGAs
CLB: Configurable Logic Block
Configurable Logic Block (CLB)
Information Processing
ASIC
Processor
Energy efficiency
Code-size efficiency
Run-time efficiency
Special features of DSP processors
Multimedia instructions
Very Long Instruction Word (VLIW) machines
Reconfigurable Hardware
Memory
Access time will be a problem
Speed gap between processor and memory increases
early sixties (Atlas): page fault ~ 2500 instructions
2002 (2 GHz µP): access to DRAM ~ 500 instructions
penalty for cache miss soon be same as for page fault in Atlas
Speed
8
4
2x
every 2
years
2
1
0
1
2
3
4
5
years
[P. Machanik: Approaches to Addressing
the Memory Wall, TR Nov. 2002, U.
Brisbane]
Access times and energy consumption
Example (CACTI Model):
"Currently, the size
of some
applications is
doubling every 10
months"
[STMicroelectronics, Medea+
Workshop, Stuttgart, Nov. 2003]
Energy consumption by Memory
Mobile PC
Thermal Design (TDP) System Power
Other
13%
Other
13%
600/500 MHz uP
37%
Power Supply
10%
600/500 MHz uP
13%
Power Supply
10%
Memory+Graphics
12%
HDD
9%
Mobile PC
Average System Power
LCD 10"
30%
Memory+Graphics
15%
LCD 10"
19%
HDD
19%
Note: Based on Actual Measurements
CPU Dominates Thermal
Design Power
[Source: V. Tiwari]
Multiple Platform
Components Comprise
Average Power
“CPU” Power Dissipation
EBOX
8%
DMMU
8%
Others
5%
Icache
26%
42% / 40% memory-related !
Clock
10%
IMMU
9%
Ibox
18%
Dcache
16%
Strong ARM
IEEE Journal of SSC
Nov. 96
TLB
17%
Control L.
16%
Data Flow
11%
I/O
7%
Clock
19%
PLA
5%
ROM
2%
Cache
23%
Power PC
Based on slide by and ©: Osman S. Unsal, Israel Koren, C.
Mani Krishna, Csaba Andras Moritz, University of
Massachusetts, Amherst, 2001
Proceedings of ISSCC 94
Real-time Capability
Timing behavior has to be predictable.
Features that cause problems:
Caches with difficult to predict replacement strategies
Unified caches (conflicts between instructions and data)
Pipelines with difficult to predict stall cycles ("bubbles")
Interrupts that are possible any time
Memory refreshes that are possible any time
Instructions that have data-dependent execution times
[Dagstuhl workshop on predictability, Nov. 17-19, 2003]
No caches, use Scratch Pad memories
Why not just use a Cache ?
1. Predictability?
Worst case execution time
(WCET) may be large
[P. Marwedel et al., ASPDAC, 2004]
Scratch pad Memory (SPM)
Hierarchy
Example
main
SPM
Address space
processor
0
scratch pad memory
no tag
memory
FFF..
ARM7TDMI cores,
well-known for low
power consumption
Conclusions
Embedded systems consist of a wide variety
of hardware (analog/digital) components
Sensors and actuators interact with the
physical world
Communication needs to be efficient/real-time
Processor design needs to be aware of
energy efficiency, performance, code size, etc.
Memory design also needs many constraints
cache is not suitable in real-time systems
Reconfigurable systems provide a trade-off
between flexibility and efficiency
87