Design - Sequence Diagrams
Download
Report
Transcript Design - Sequence Diagrams
Computer Engineering
Senior Projects
&
Research Overview
An informal overview of past & current
projects
students & my own
by
Al Davis
School of Computing
1
The Engineering Discipline
Role
– design and build things
– change the world around us
» hopefully for the better
» hence faced with a continuous ethical dilemma
Ultimate requirement
– what we build must work
Requisite skills
– science: math, physics, chemistry, materials, CS, …
– engineering: state of the art, current practice, technology trends,
manufacturing, testability, maintenance, life cycle costs, …
– art: creative component that is clearly evident in the great
engineers
School of Computing
2
Computer Engineering
Design and build computer systems
– inherently involves both software and hardware design skills
System software
– compiler, operating system, device drivers, …
– as opposed to application specific software
» applications are the target system “user”
» hence they are used in design evaluation (pre- and post-build)
Hardware: possibly many disciplines and levels
– VLSI chip design: analog and digital circuit aspects
» CS, EE, physics are the key disciplines
» yet cooling is a big issue – enter ME aspects
– board design: CS, EE, and manufacturing issues are dominant
– system design: balance of HW and SW capabilities
School of Computing
3
CE Senior Projects at Utah
Logistics
– CE program run jointly by SoC and ECE departments
– Senior project is capstone project course
» team based
» students choose their own project
» best mechanism to demonstrate your abilities to future
employers
– CE Senior Project is a year long activity
» at least for the last 2.5 years
» Spring term of junior year: plan and propose
» Summer: get parts and start building (optional)
» Fall term of senior year: build and demonstrate
– Exit interview feedback
» rave reviews for being hard, fun, and instructive
School of Computing
4
04 Projects
Satellite Tracking station
Weaver – a 802.11 remote control vehicle interface
– camera on car: image and commands to base station via wireless
– car has autonomous anti-collision capability (infrared)
GPS Hummer
– autonomous navigation and anti-collision
– some AI in route finding since Hummer remembers obstacles that it saw
previously
PCI Coprocessor
– efficient acceleration via PCI add-on
Jiggawax
– build your own iPod
RVI – remote vehicle interface
– control via web or cell phone
– control windows, engine, and door locks from RF base station
School of Computing
5
05 Projects
Carputer
– OBDII car data and 802.11g auto-sync to base station
– monitor your car or your kids
IR tag
– paintball without the mess
Athlete monitor system
– real time tracking of position and heart rate to central coaching
station
– GPS, RF, and HRM on-athlete
Inverted pendulum 2-wheeled robot
Multi-carrier reflectometry
– finding faults in aircraft wires without tearing the plane apart
Glider avionics package
– using accelerometers, GPS, and strain sensors
School of Computing
6
Current 06 projects (underway now)
PEN
– electronic paper – the only paper you’ll ever buy!
Recipedia
– a cook book that talks and listens to you
GPS tracker
– use campus ubiquitous wireless to keep track of where things are via your
cell phone or computer
OmegaCore
– a DVR that knows how to remove commercials for you
NoCPR
– bathtub drowning prevention
Tracking Visor
– virtual reality on your head
School of Computing
7
Selected Examples
Some images to illustrate previous projects
School of Computing
8
Satellite Tracking Station
Final dual band antenna
on the roof of MEB during
demo day
School of Computing
9
2 meter (VHF side) antenna specs – students used an antenna design CAD tool
School of Computing
10
School of Computing
11
School of Computing
12
School of Computing
13
School of Computing
14
GPS Hummer
School of Computing
15
Controlling direction and speed with transistors
School of Computing
16
GPS internals
School of Computing
17
A build your own GPS kit from Motorola
School of Computing
18
Autonomous anit-collision
system
School of Computing
19
School of Computing
20
School of Computing
21
School of Computing
22
School of Computing
23
Glider Avionics Package (note this ended up being done by a single student as a thesis)
School of Computing
24
Designing an electronic compass is non-trivial
especially if you want tilt-compensation
School of Computing
25
Board
Schematic
School of Computing
26
Power Supply
Filters and Registers
Board Artwork
School of Computing
27
School of Computing
28
School of Computing
29
Senior Project Synopsis
This was just a peek
Just remember
– if you can imagine it you can usually build it
» there are some things you just can’t do
» like a perpetual motion machine
which violates the laws of physics
– all it takes is dedication and time
Huge diversity of both opportunities and problems
You might have noticed the world isn’t perfect
– so help fix it!
School of Computing
30
Personal Research Overview
Past
– dataflow, VLSI, asynchronous circuits, parallel computing, high
performance architectures (50% academia, 50% industry)
Currently there are 4 projects
– Domain specific architectures
» target highly constrained embedded systems
» will highlight the perception processor today
have also worked in signal processing and cell phone domains
– Interconnect driven architecture
» w/ Rajeev Balasubramonian & students
– RPU design
» w/ Erik Brunvand, Pete Shirley, Steve Parker, & students
– VLSI wire scaling theory
» w/ Stephanie Forrest & Melanie Moses @ UNM
School of Computing
31
Embedded Computing
Characteristics
Historically
–
–
narrow application specific focus
typically cheap, low-power, provide just enough compute
power
» niche filled by small microcontroller/dsp devices
» AND often ASIC component(s)
New Pressures
–
–
–
world goes bonkers on mobility and the web
» expects ubiquitous information access
» expects better and cheaper everything
sensors, microphones & cameras become free
» so use lots of them
now we’re talking real computing
School of Computing
32
New Look for ECS
Sophisticated application suites
–
not single algorithms – e.g.
» 3G and 4G cellular handsets
»
process what is streaming in from the net
»
includes real time media & web access
process the sensor, microphone, and camera streams
–
multiple channels and multiple encoding models
plus the usual DSP stuff
plus network information from the neighborhood
since things are starting to happen in groups
wide range of services
» dynamic selection
» no single app will do
Rate of algorithmic change is staggering
School of Computing
33
ECS Economics
Traditional reliance on the ASIC design cycle
–
–
lengthy IC design - > 1 year typical
little re-use
»
IP import works but there are many pitfalls
–
turning an IC is costly
»
even when it works the first time
ECS product cycles
–
–
HDL code synthesize ed inefficiency
Macroblock forces process and layout issues
lifetime similar to a mayfly
need next improved version “real soon now”
Result
–
sell monster volumes in a short time or lose
School of Computing
34
What is Perception Processing ?
Ubiquitous computing needs natural human interfaces
Processor support for perceptual applications
–
–
–
–
Gesture recognition
Object detection, recognition, tracking
Speech recognition
Biometrics
Applications
–
–
–
–
Multi-modal human friendly interfaces (our focus)
Intelligent digital assistants
Robotics, unmanned vehicles
Perception prosthetics
School of Computing
35
Perception Processing Problem
consider always on aspect!!
School of Computing
36
Current Processors Inadequate
Too slow, too much power for embedded space!
–
2.4 GHz Pentium 4 ~ 60 Watts
–
400 MHz Xscale ~ 800 mW
–
10x or more difference in performance but 100x in power
Inadequate memory bandwidth
–
Sphinx requires 1.2 GB/s memory bandwidth
–
Xscale delivers 64 MB/s ~ 1/19th
Our methodology
–
Characterize applications to find the problems
–
Derive acceleration architecture
»
School of Computing
History of FPUs is an analogy
37
The Problem w/ GPP’s
caches & speculation
–
–
rigid communication model
–
–
–
consume significant area and energy
great when they work – a liability when they don’t
data moves from memory to registers
register execution unit register
inability to support specialized computational pipelines
» ASIC advantage
bottom line
–
–
–
can process anything
but not efficiently in many cases
it’s the von Neumann trap
» lots of overhead for almost no work
School of Computing
38
The FaceRec Application
School of Computing
39
FaceRec In Action
Bobby Evans
School of Computing
40
Application Structure
ANN based
Flesh tone
Image
Segment
Image
Rowley
Face
Detector
Viola &
Jones
Face
Detector
Neural
Net Eye
Locator
Eigenfaces
Face
Recognizer
~200 stage
Identity,
Adaboost
Coordinates
Flesh toning: Soriano et al, Bertran et al
Segmentation: Text book approach
Rowley detector, voter: Henry Rowley, CMU
Viola & Jones’ detector: Published algorithm + Carbonetto, UBC
Eigenfaces: Re-implementation by Colorado State University
School of Computing
41
Application Profile
Eigen
Faces
19%
Eye
Locator
17%
Flesh tone
4%
Eigen
Faces
19%
Viola/
Jones
Detector
60%
Execution time break down
(Using Viola/Jones detector)
School of Computing
Eye
Locator
10%
Flesh
tone
6%
Rowley
Detector
65%
Execution time break down
(Using Rowley's detector)
42
Face Recognition Analysis
Cache
– small L1D$ high hit rate
– L2$ is useless – most L1 misses pass through
IPC
– low even with lots of FP execution units
– Why?
» load store register & memory ports saturate
multiple large matrix traversals are the critical kernel
several indirect accesses per operation
» dominant loop is a SFP inner product
no single cycle accumulate
Implications
– restructure the code – loop fusion more temporary reg’s
– need architectures which move data well
School of Computing
43
CMU Sphinx 3.2 Profile
Feature Vector = 13 Mel + 1st and 2nd derivative
10 ms of speech is compressed into 39 SP floats
iMic possibility
School of Computing
44
Speech Analysis
Results
– similar to FaceRec
» cache
» port saturation
– big difference
» also memory B/W starved
» due to language model
FE
0.98%
HMM
41.45%
GAU
57.57%
Execution time
(opt)
School of Computing
45
Simple ASIC Design Example:
Matrix Multiply
def matrix_multiply(A, B, C): # C is the result matrix
for i in range(0, 16):
for j in range(0, 16):
C[i][j] = inner_product(A, B, i, j)
def inner_product(A, B, row, col):
sum = 0.0
for i in range(0,16):
sum = sum + A[row][i] * B[i][col]
return sum
School of Computing
46
ASIC Accelerator Design:
Matrix Multiply
Control Pattern
def matrix_multiply(A, B, C): # C is the result matrix
for i in range(0, 16):
for j in range(0, 16):
C[i][j] = inner_product(A, B, i, j)
def inner_product(A, B, row, col):
sum = 0.0
for i in range(0,16):
sum = sum + A[row][i] * B[i][col]
return sum
School of Computing
47
ASIC Accelerator Design:
Matrix Multiply
Access Pattern
def matrix_multiply(A, B, C): # C is the result matrix
for i in range(0, 16):
for j in range(0, 16):
C[i][j] = inner_product(A, B, i, j)
def inner_product(A, B, row, col):
sum = 0.0
for i in range(0,16):
sum = sum + A[row][i] * B[i][col]
return sum
School of Computing
48
ASIC Accelerator Design:
Matrix Multiply
Compute Pattern
def matrix_multiply(A, B, C): # C is the result matrix
for i in range(0, 16):
for j in range(0, 16):
C[i][j] = inner_product(A, B, i, j)
def inner_product(A, B, row, col):
sum = 0.0
for i in range(0,16):
=
sum
sum
return sum
School of Computing
+
A[row][i]
*
B[i][col]
49
ASIC Accelerator Design: Matrix Multiply
def matrix_multiply(A, B, C): # C is the result
matrix
for i in range(0, 16):
for j in range(0, 16):
C[i][j] = inner_product(A, B, i, j)
def inner_product(A, B, row, col):
sum = 0.0
for i in range(0,16):
sum = sum + A[row][i] * B[i][col]
return sum
School of Computing
50
How can we generalize ?
Decompose loop into:
– Control pattern
– Access pattern
– Compute pattern
Programmable h/w acceleration for each pattern
School of Computing
51
Architecture Family
School of Computing
52
Experimental Method
Measure processor power on
–
2.4 GHz Pentium 4, 0.13u process
–
400 MHz XScale, 0.18u process
Perception Processor
–
1 GHz, 0.13u process (Berkeley Predictive Tech Model)
–
Verilog, MCL HDLs
–
Synthesized using Synopsys Design Compiler
–
Fanout based heuristic wire loads
–
Spice (Nanosim) simulation yields current waveform
–
Numerical integration to calculate energy
ASICs in 0.25u process
Normalize 0.18u, 0.25u energy and delay numbers
–
model = constant field scaling
School of Computing
53
Benchmarks
Visual feature recognition
–
–
–
Speech recognition
–
–
HMM: 5 state Hidden Markov Model
GAU: 39 element, 8 mixture Gaussian
DSP
–
–
Erode, Dilate: Image segmentation operators
Fleshtone: NCC flesh tone detector
Viola, Rowley: Face detectors
FFT: 128 point, complex to complex, floating point
FIR: 32 tap, integer
Encryption
–
Rijndael: 128 bit key, 576 byte packets
School of Computing
54
Results: IPC
Mean IPC =
3.3x R14K
School of Computing
55
Results: Throughput
Mean
Throughput =
1.75x Pentium
0.41x ASIC
School of Computing
56
Results: Energy
Mean
Energy/packet =
7.4% of XScale
5x of ASIC
School of Computing
57
Results: Energy Delay Product
Mean EDP =
159x XScale
1/12 of ASIC
School of Computing
58
Perception Results: Summary
41% of ASIC’s performance
But programmable!
1.75 times the Pentium 4’s throughput
But 7.4% of the energy of an XScale!
advanced perceptive embedded systems are
possible
–
–
above results are maximally pessimistic
and as always there are improvements in the works
Problems
–
–
–
manually intensive design process
requires highly skilled programmer, architect, circuit
designer
current effort is to fix this
School of Computing
59
Automating the design process
Application Suite
C
Host Code
C & ifc
Splitter
Human
opt. Stream Code
Interaction
Stream
Compiler
Host
Compiler
CoProcessor
Description
Host
Object Code
Synthesize
School of Computing
design
choice
CoProcessor
Simulator
CoProcessor
Object Code
dilation
Design Track
Graph
add point
Simulation Analysis
&
Design Space Explore
60
DSE Results
Power
Performance Requirement
No Way Quadrant
Too “Watty”
Quadrant
Power Limit
Too Dweeby Quadrant
Choice Quadrant
Performance
School of Computing
61
Conclusions
Significant benefit
–
–
3 forms of parallelism: control, address, execution
program controlled communication patterns
» able to mimic ASIC flows
» more efficient use of execution units and memory
structures
Results to date (in terms of ed)
–
–
–
2-3 orders of magnitude improvement over GPP
within 1 order of magnitude of an ASIC
while maintaining most of the generality of the GPP
approach
School of Computing
62
Thanks!
Questions?
School of Computing
63