architecture

Download Report

Transcript architecture

Embedded Systems Architecture
 Introduction
 Sensors
 Actuators
 A/D and D/A Converters
 Communication
 Processing Units
 Conclusion
CDA 4630/5636 – Spring 2016
Copyright © 2016 Prabhat Mishra
1
Components of Embedded Systems
Display
Analog  Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital  Analog
Converter
Actuators
Sensors
Environment
2
Sensors
 Processing of physical data starts with
capturing this data.
 Sensors can be designed for virtually every
physical stimulus

heat, light, sound, weight, velocity, acceleration,
electrical current, voltage, pressure, ...
 Many physical effects used for constructing
sensors.

law of induction (generation of voltages in an
electric field),
 light-electric effects.
Artificial Leg
4
Artificial Hand
5
Prosthetic Hand
6
Prosthetic Hand with Sensors
Identity
Protected
7
Artificial Eyes
© Dobelle Institute
(www.dobelle.com)
Charge-Coupled Devices (CCD)
Image Sensors: Based on charge transfer to next pixel cell
CMOS Image Sensors
 Based on standard production process for CMOS
chips, allows integration with other components.
Source: B. Diericks: CMOS image sensor
concepts. Photonics West 2000 Short course
Comparison CCD/CMOS sensors
Biometrical Sensors
Example: Fingerprint sensor (© Siemens, VDE):
Matrix of 256 x
256 elem.
Voltage ~
distance.
Resistance
also
computed. No
fooling by
photos and
wax copies.
Carbon dust?
Integrated into ID mouse.
Components of Embedded Systems
Display
Analog  Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital  Analog
Converter
Actuators
Sensors
Environment
13
Discretization of Time
Sampling: how often the signal is converted?
Quantization: how many bits used for sampling?
Aliasing
Signal frequency: 5.6 Hz
Sampling frequency: 9 Hz
1.5
1
0.5
0
-0.5
-1
-1.5
Analog to Digital Conversion
Sampling: how often is the signal converted?
Twice
as high as the highest frequency signal
present in the input
Quantization: how many bits used to
represent a sample?
Sufficient to provide required dynamic range
 16-bit A/D  20log10(216) = 96 dB (human ear limit)
Under-loading: dynamic range not used properly
Clipping: input signal beyond the dynamic range
Aliasing: erroneous signals, not present in
analog domain, but present in digital domain
Use anti-aliasing filters
Sample at higher than necessary
rate
Analog-to-Digital Converter
3.0V
2.5V
2.0V
1.5V
1.0V
0.5V
0V
proportionality
4
4
analog output (V)
5.0V
4.5V
4.0V
3.5V
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
analog input (V)
Vmax = 7.5V
7.0V
6.5V
6.0V
5.5V
3
2
1
t1
0100
t2
t3
time
t4
1000 0110 0101
Digital output
analog to digital
3
2
1
t1
t2
0100
t3
1000 0110
Digital input
digital to analog
t4
time
0101
Flash A/D Converter
 Parallel comparison
with reference voltage
 Speed:
O(1)
 HW complexity: O(n)

n= # of distinguished
voltage levels
Successive Approximation
Key idea: binary search:
Set MSB='1'
if too large: reset MSB
Set MSB-1='1'
if too large: reset MSB-1
…..
Speed:
O(log(n))
Hardware complexity:
O(log(n))
with n= # of distinguished
voltage levels;
slow, but high precision possible.
Successive Approximation
Given an analog input signal whose voltage should range
from 0 to 15 volts, and an 8-bit digital encoding, calculate
the correct encoding for 5 volts.
½(Vmax – Vmin) = 7.5 volts
Vmax = 7.5 volts.
0
0
0
0
0
0
0
0
½(5.63 + 4.69) = 5.16 volts
Vmax = 5.16 volts.
0
1
0
1
0
0
0
0
½(7.5 + 0) = 3.75 volts
Vmin = 3.75 volts.
0
1
0
0
0
0
0
0
½(5.16 + 4.69) = 4.93 volts
Vmin = 4.93 volts.
0
1
0
1
0
1
0
0
½(7.5 + 3.75) = 5.63 volts
Vmax = 5.63 volts
0
1
0
0
0
0
0
0
½(5.16 + 4.93) = 5.05 volts
Vmax = 5.05 volts.
0
1
0
1
0
1
0
0
½(5.63 + 3.75) = 4.69 volts
Vmin = 4.69 volts.
0
1
0
1
0
0
0
0
½(5.05 + 4.93) = 4.99 volts
0
1
0
1
0
1
0
1
Components of Embedded Systems
Display
Analog  Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital  Analog
Converter
Actuators
Sensors
Environment
21
Digital-to-Analog (D/A) Converters
Various types, can be quite
simple, e.g.:
Output voltage  no. represented by x
Due to Kirchhoff‘s laws: I  x3 

Vref
Vref
R
 x2 
Vref
2R
 x1 
Vref
4R
 x0 
Vref
8 R
3
  xi  2i 3
R i 0
Due to Kirchhoff‘s laws: V  R  I '  0
1
Current into Op-Amp=0: I  I '
Hence:
Finally:
V  R1  I  0
 V  Vref
R1 3
R1
i 3
  xi  2  Vref 
 nat ( x )
R i 0
8 R
Components of Embedded Systems
Display
Analog  Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital  Analog
Converter
Actuators
Sensors
Environment
24
Stepper Motor Controller
Stepper motor: rotates fixed number of
degrees when given a “step” signal
In
contrast, DC motor just rotates when power
applied.
Rotation achieved by applying specific voltage
sequence to coils
Controller greatly simplifies this
Stepper Motor Controller
Sequence
1
2
3
4
5
A
+
+
+
B
+
+
+
A’ B’
+ + +
- +
-
sbit SM_A, SM_B, SM_AP, SM_BP; // ports
int curr_pos; // tells us the current step position
void reset() { // must be called to synchronize
current_pos = 0;
for(int i=0; i<4; i++) {
move_one_step(0);
}
}
void move_one_step(int dir/*0=CW,1=CCW*/) {
const int SM_TBL[4][4] = {
1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0 };
cur_pos = (curr_pos + (dir == 0 ? +1 : +3)) % 4;
SM_A = SM_TBL[curr_pos][0];
SM_B = SM_TBL[curr_pos][1];
SM_AP = SM_TBL[curr_pos][2];
SM_BP = SM_TBL[curr_pos][3];
ms_delay(50);
}
Actuators
Huge variety of actuators and output devices.
Microsystems motors as examples (© MCNC):
(© MCNC)
Micro-array of Mirrors
28
Components of Embedded Systems
Display
Analog  Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital  Analog
Converter
Actuators
Sensors
Environment
29
LCD
Liquid Crystal Display
N rows by M columns
Controller build into the
LCD module
Simple microprocessor
interface using ports
Software controlled
Components of Embedded Systems
Display
Analog  Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital  Analog
Converter
Actuators
Sensors
Environment
31
Communication: Hierarchy
 Inverse relation between volume and
urgency quite common:
Sensor/actuator busses
Communication: Requirements
Real-time behavior
Efficient, economical
Bandwidth and communication delay
Robustness
Fault tolerance
Maintainability
Diagnosability
Security
Basic Techniques: Electrical Robustness
Single-ended vs. differential signals
ground
Voltage at input of Op-Amp positive  '1'; otherwise  '0'
Local ground
Local ground
Combined with twisted pairs; Most noise added to both wires.
Evaluation (Twisted Pairs)
Advantages:
Subtraction removes most of the noise
Changes of voltage levels have no effect
Reduced importance of ground wiring
Higher speed
Disadvantages:
Requires negative
Increased number
voltages
of wires and connectors
Applications:
USB, FireWire, ISDN
Ethernet (STP/UTP CAT 5 cables)
Differential SCSI
High-quality analog audio signals
Real-time behavior
Carrier-sense multiple-access/collisiondetection (CSMA/CD, Standard Ethernet) no
guaranteed response time.
Alternatives:
Token
rings, token busses
Carrier-sense multiple-access/collision-avoidance
(CSMA/CA)
 WLAN techniques with request preceeding transmission
 Each partner gets an ID (priority). After each bus transfer, all
partners try setting their ID on the bus; partners detecting higher ID
disconnect themselves from the bus. Highest priority partner gets
guaranteed response time; others only if they are given a chance.
Example1: Sensor/Actuator Bus
 Real-time behavior is very important
 Different techniques:
Many wires
less wires
CNC: Computerized Numerical Control
expensive & flexible
Example2: Field bus
 More powerful/expensive than sensor
interfaces; serial busses preferred.
Examples:
Process Field Bus (Profibus)
http://www.profibus.com
Token passing;
9.6 kbit/s (1200 m) to 500 kbits/s (200m);
too slow to be used for hard time constraints.
Field Buses
Controller area network (CAN)









Designed by Bosch and Intel in 1981;
Used in cars and other equipment;
Differential signaling with twisted pairs,
Arbitration using CSMA/CA,
Throughput between 10kbit/s and 1 Mbit/s,
Low and high-priority signals,
Max. latency of 134 µs for high priority signals,
Coding similar to that of serial (RS-232) lines of
PCs, with modifications for differential signaling.
http://www.can.bosch.com
Field Buses
 The Time-Triggered-Protocol (TTP) [Kopetz et al.]

for fault-tolerant safety systems like airbags in cars.
 FlexRay: TDMA (Time Division Multiple Access) protocol,
developed by the FlexRay consortium (BMW, Ford, Bosch,
DaimlerChrysler, General Motors, Motorola, Philips).
 Combination of a variant of the TTP and the byteflight
[Byteflight Consortium, 2003] protocol.
 Designed to meet key automotive requirements
 Complements the major in-vehicle networking standards
 A high data rate can be achieved: initially targeted for a
data rate of approximately 10Mbit/sec; however, the
design of the protocol allows much higher data rates to be
achieved.
Example3: Wireless Communication
Wireless Communication
IEEE 802.11 a/b/g
UMTS (Universal Mobile Telecommunications System)

Bandwidth is becoming a scarce resource.
DECT (Digital Enhanced Cordless Telecommunications)

Standard used for wireless phones in Europe
Bluetooth

Connect devices e.g., mobile phone and headset
Components of Embedded Systems
Display
Analog  Digital
Converter
Embedded Computing
(Processors, Memories, …)
Digital  Analog
Converter
Actuators
Sensors
Environment
43
Global Energy Consumption
Quadrillion British thermal units (Btu)
OECD - Organization for Economic Co-operation and Development
Source: U.S. Energy Information Administration, International Energy Outlook 2011. 44
Global Greenhouse Gas Emissions
45
Global Greenhouse Gas Emissions
Improved efficiency with Information
Technology can reduce 29% in green
house gases for transport, industrial,
residential and commercial sectors.
2009 U.S. Greenhouse Gas Inventory Report
46
Power Consumption in Data Centers
In 2006, datacenters used 1.5% (60 billion KW-hr/year) of all the electricity
produced in the US … if nothing significant is done about the situation,
this consumption will rise to 2.9% by 2011.
47
Report to Congress on Server and Data Center Energy Efficiency, EPA 2007.
The Energy/Flexibility Conflict
Operations/Watt
[MOps/mW]
10
1
DSP-ASIPs
µPs
0.1
poor design
generation
techniques
0.01
1.0µ
0.5µ
0.25µ
0.13µ
0.07µ
Technology
Power and Energy
P
E   P dt
E
t
In many cases, faster execution also means less
energy, but the opposite may be true if power has to
be increased to allow faster execution.
Power and Energy
Power is drawn from a voltage source
Power:
Energy:
P(t )  iDD (t )VDD
T
T
0
0
E   P(t )dt   iDD (t )VDDdt
T
Average Power:
Pavg 
E 1
  iDD (t )VDD dt
T T 0
Dynamic Power
 Power needed to charge and discharge load
capacitances when transistors switch.
 The capacitor needs to charge for output to be ‘1’

For output to be ‘0’, capacitor needs to discharge
 This repeats T.fsw times over an interval of T
T
Pdynamic
1
  iDD (t )VDD dt
T 0
VDD
iDD(t)
T
VDD

iDD (t )dt

T 0
fsw
VDD

TfswCVDD 
T
2
2
P


CV

 CVDD f sw
dynamic
DD f
C
Here,  is activity factor
and f is clock frequency.
Low Power vs. Low Energy
Minimize the power consumption




Design of the power supply
Design of voltage regulators
Dimensioning of interconnect
Short term cooling
Minimizing the energy consumption

Restricted availability of energy (mobile systems)
 Limited battery capacities (only slowly improving)
 Very high costs of energy (solar panels, in space)
Cooling
 High costs
 Limited space
Dependability
 Long lifetimes, low temperatures
Information Processing
 ASIC
 Processor

Energy efficiency
 Code-size efficiency
 Run-time efficiency
 Special features of DSP processors
 Multimedia instructions
 Very Long Instruction Word (VLIW) machines
 Reconfigurable Hardware
 Memory
Application Specific Circuits (ASIC)
Custom-designed circuits
necessary if ultimate speed
or energy efficiency is the
goal and large numbers can
be sold.
Approach suffers from long
design times and high costs.
Information Processing
 ASIC
 Processor

Energy efficiency
 Code-size efficiency
 Run-time efficiency
 Special features of DSP processors
 Multimedia instructions
 Very Long Instruction Word (VLIW) machines
 Reconfigurable Hardware
 Memory
Reducing Energy Consumption
Pentium
Crusoe
Running the same multimedia application. [www.transmeta.com]
Infrared Cameras (FLIR) can be used to detect thermal distribution.
Dynamic Power Management (DPM)
RUN: operational
IDLE: a SW routine may stop the CPU when
not in use, while monitoring interrupts
SLEEP: Shutdown of on-chip activity
400mW
RUN
10µs
90µs
160ms
STRONGARM
SA1100
10µs
90µs
IDLE
SLEEP
50mW
160µW
Dynamic Voltage Scaling (DVS)
E = P x T
P  V2

E (energy), P (power), T (time), V (voltage)
Example

A task is given with workload (W) and deadline
(D). Assume that idle energy is negligible.
E1  V12.T1 = V2.T
E2  V22.T2 = V2/4.2T
= E1/2
V
V/2
T
D
T
2T D
58
Dynamic Voltage Scaling
59
Information Processing
 ASIC
 Processor

Energy efficiency
 Code-size efficiency
 Run-time efficiency
 Special features of DSP processors
 Multimedia instructions
 Very Long Instruction Word (VLIW) machines
 Reconfigurable Hardware
 Memory
Code Size Efficiency
RISC machines designed for run-time, not for
code-size-efficiency
Compression techniques: key idea
Code-size Efficiency
Compression techniques (continued):
2nd
instruction set e.g., ARM Thumb instruction set
001 10
major
opcode
Rd
Constant
16-bit Thumb instr.
ADD Rd #constant
minor source=
opcode destination
1110 001 01001
0 Rd
zero extended
0 Rd 0000 Constant
• Reduction to 65-70 % of original code size
• 130% of ARM performance with 8/16 bit memory
• 85% of ARM performance with 32-bit memory
Domain-oriented Architectures
n-1
Application: y[j] = i=0 x[j-i]*a[i]
i: 0i  n-1: yi[j] = yi-1[j] + x[j-i]*a[i]
Architecture: ADSP210x (analog.com)
P
D
AX
Addressregisters
A0, A1, A2
..
i+1, j-i+1
Address
generation
unit (AGU)
x
a
x[j-i]
AY
MX
AF
- Parallelism
- Dedicated
registers
MY
a[i]
MF
MR:=0; A1:=1; A2:=n-2;
MX:=x[n-1]; MY:=a[0];
for ( j:=1 to n) {
MR:=MR+MX*MY;
+,-,..
AR
MY:=a[A1];
* x[j-i]*a[i]
+,yi-1[j]
MR
MX:=x[A2];
A1++; A2—
}
Information Processing
 ASIC
 Processor

Energy efficiency
 Code-size efficiency
 Run-time efficiency
 Special features of DSP processors
 Multimedia instructions
 Very Long Instruction Word (VLIW) machines
 Reconfigurable Hardware
 Memory
Digital Signal Processing (DSP)
 Multiply/accumulate (MAC) and zerooverhead loop (ZOL) instructions
 Heterogeneous registers
 Separate address generation units (AGUs)
Digital Signal Processing (DSP)
Modulo addressing
sliding window
Am++  Am:=(Am+1) mod n
x
(implements ring or circular
buffer in memory)
t1
t2
t
..
x[n-2]
x[n-1]
x[0]
x[1]
..
Memory, t=t1
..
x[n-3]
x[n-2]
x[n-1]
x[n]
x[1]
Memory, t2=t1+1
Multimedia Instructions
 Many registers, adders etc. are very wide

32 or 64 bits
 Most multimedia data types are narrow

e.g., 8 bits per color, 16 bit per audio sample
 2 - 8 values can be stored per register and added.
+
4 additions per
instruction; carry disabled
at word boundaries.
HP precision architecture (hp PA)
Half word add instruction HADD:
Half word add?
Optional saturating arithmetic.
Up to 10 instructions can be replaced by HADD.
Application
Scaled
interpolation
between two
images
Next word =
next pixel,
same color.
4 pixels
processed at
a time.
Pentium MMX Architecture
 64-bit vectors represent 8 bytes, 4 words or 2
double word encoded numbers.

wrap around/saturating options.
 Multimedia registers mm0 - mm7, consistent with
floating-point registers (OS unchanged).
Instruction
Options
Comments
Padd[b/w/d]
PSub[b/w/d]
wrap around, addition/subtraction of
saturating
bytes, words, double words
Pcmpeq[b/w/d]
Pcmpgt[b/w/d]
Result= "11..11" if true, "00..00" otherwise
Result= "11..11" if true, "00..00" otherwise
Pmullw
Pmulhw
multiplication, 4*16 bits, least significant word
multiplication, 4*16 bits, most significant word
Pentium MMX Architecture
Psra[w/d]
Psll[w/d/q]
Psrl[w/d/q]
No. of
positions in
register or
instruction
Punpckl[bw/wd/dq]
Punpckh[bw/wd/dq]
Packss[wb/dw]
Parallel shift of words, double words
or 64 bit quad words
Parallel unpack
Parallel unpack
saturating
Parallel pack
Pand, Pandn
Por, Pxor
Logical operations on 64 bit words
Mov[d/q]
Move instruction
VLIW Processors
 VLIW: Very Long Instruction Word
 Detection of parallelism is done by compiler,
not by hardware at run-time (inefficient).
 Parallel operations (instructions) encoded in
one long word (instruction packet), each
instruction controlling one functional unit.
Partitioned Register Files
 Many memory ports are required to supply
enough operands per cycle.
 Memories with many ports are expensive.

Registers are partitioned into sets, e.g. for TI
C60x:
Data path A
Data path B
register file A
L1
S1
register file B
M1
D1
D2
Address bus
Data bus
M2
S2
L2
Microcontrollers: MHS 80C51












8-bit CPU optimised for control applications
Extensive Boolean processing capabilities
64 k Program Memory address space
64 k Data Memory address space
4 k bytes of on chip Program Memory
128 bytes of on chip data RAM
32 bi-directional and indiv. addressable I/O lines
Two 16-bit timers/counters
Full duplex UART
6 sources/5-vector interrupts with 2 priority levels
On chip clock oscillators
Very popular CPU with many different variations
Information Processing
 ASIC
 Processor

Energy efficiency
 Code-size efficiency
 Run-time efficiency
 Special features of DSP processors
 Multimedia instructions
 Very Long Instruction Word (VLIW) machines
 Reconfigurable Hardware
 Memory
Reconfigurable Logic
Full custom chips may be too expensive,
software may be too slow.

Use of configurable hardware
 e.g., field programmable gate arrays (FPGAs)
Application areas

Fast prototyping
 configuring mobile phone according to local standards

Low volume applications
Example: Xilinx Virtex II FPGAs
Floorplan of VIRTEX II FPGAs
CLB: Configurable Logic Block
Configurable Logic Block (CLB)
Information Processing
 ASIC
 Processor

Energy efficiency
 Code-size efficiency
 Run-time efficiency
 Special features of DSP processors
 Multimedia instructions
 Very Long Instruction Word (VLIW) machines
 Reconfigurable Hardware
 Memory
Access time will be a problem
 Speed gap between processor and memory increases
 early sixties (Atlas): page fault ~ 2500 instructions
2002 (2 GHz µP): access to DRAM ~ 500 instructions
 penalty for cache miss soon be same as for page fault in Atlas
Speed
8
4
 2x
every 2
years
2
1
0
1
2
3
4
5
years
[P. Machanik: Approaches to Addressing
the Memory Wall, TR Nov. 2002, U.
Brisbane]
Access times and energy consumption
Example (CACTI Model):
"Currently, the size
of some
applications is
doubling every 10
months"
[STMicroelectronics, Medea+
Workshop, Stuttgart, Nov. 2003]
Energy consumption by Memory
Mobile PC
Thermal Design (TDP) System Power
Other
13%
Other
13%
600/500 MHz uP
37%
Power Supply
10%
600/500 MHz uP
13%
Power Supply
10%
Memory+Graphics
12%
HDD
9%
Mobile PC
Average System Power
LCD 10"
30%
Memory+Graphics
15%
LCD 10"
19%
HDD
19%
Note: Based on Actual Measurements
CPU Dominates Thermal
Design Power
[Source: V. Tiwari]
Multiple Platform
Components Comprise
Average Power
“CPU” Power Dissipation
EBOX
8%
DMMU
8%
Others
5%
Icache
26%
42% / 40% memory-related !
Clock
10%
IMMU
9%
Ibox
18%
Dcache
16%
Strong ARM
IEEE Journal of SSC
Nov. 96
TLB
17%
Control L.
16%
Data Flow
11%
I/O
7%
Clock
19%
PLA
5%
ROM
2%
Cache
23%
Power PC
Based on slide by and ©: Osman S. Unsal, Israel Koren, C.
Mani Krishna, Csaba Andras Moritz, University of
Massachusetts, Amherst, 2001
Proceedings of ISSCC 94
Real-time Capability
Timing behavior has to be predictable.
 Features that cause problems:
 Caches with difficult to predict replacement strategies
 Unified caches (conflicts between instructions and data)
 Pipelines with difficult to predict stall cycles ("bubbles")
 Interrupts that are possible any time
 Memory refreshes that are possible any time
 Instructions that have data-dependent execution times
[Dagstuhl workshop on predictability, Nov. 17-19, 2003]
 No caches, use Scratch Pad memories
Why not just use a Cache ?
1. Predictability?
Worst case execution time
(WCET) may be large
[P. Marwedel et al., ASPDAC, 2004]
Scratch pad Memory (SPM)
Hierarchy
Example
main
SPM
Address space
processor
0
scratch pad memory
no tag
memory
FFF..
ARM7TDMI cores,
well-known for low
power consumption
Conclusions
 Embedded systems consist of a wide variety
of hardware (analog/digital) components
 Sensors and actuators interact with the
physical world
 Communication needs to be efficient/real-time
 Processor design needs to be aware of
 energy efficiency, performance, code size, etc.

Memory design also needs many constraints

cache is not suitable in real-time systems
 Reconfigurable systems provide a trade-off
between flexibility and efficiency
87