Parallel Processors - University of Michigan

Download Report

Transcript Parallel Processors - University of Michigan

Parallel Processors
Todd Charlton
Eric Uriostique
Current Technology
• Hard to find a single core processor
anymore.
• Cell phones, Labtops, etc.
• Large systems can
contain up to 512+
processors
The Motivation
• Divide and Conquer – Higher Throughput
• Lower Power Consumption
P = CV2f
The Motivation
• We need more performance on same
power budget. How?
• Remember: P = CV2f
• Scale voltage and frequency to 80%
• P = C * .82 [V] * .8 [f]
• This drops power by 50%
• Add additional core
• Result = 1.6x Speedup with same power
The Motivation
• How about reducing power consumption
but keeping the same performance?
• Remember: P = CV2f
• Scale voltage and frequency by 50%
• P = C * .52 [V] * .5 [f]
• This drops power to 12.5%
• Add additional core
• Result = 25% of original power
consumption with same performance
Amdahl’s Law
• “Speed-up is limited by amount of work
that can be done in parallel”
Credit: watermint.org
Ways To Parallelize
1. Multi-Threading:
•
Multi-thread your application on one chip
•
More elegant
2. Multi-Processing:
•
Flash serial code to separate chips
•
No worrying about scheduling!
Let’s Multi-Thread
• One Application: Counting maize pixels
2 Processors
4 Processors
Multi-Threading in
µProcessors
• Spin Propeller Processor
• Multi-Thread on 8 cores
• One application run on 8
cores
• Uses it’s own high level
language and a form of
Assembly
• In CMU Cam4
Problems with Multi-Threading
• Steep learning curve
• Learning the Language
• Parallel Slowdown
• Lot of time to set up a new thread.
If that thread does not have much
work, not worth the overhead
Multi-Threading Libraries
• Cannot program serially to take
advantage of Parallel Processing
• Intel’s Thread Building Blocks (TBB)
• OpenMP
• Boost and pthread
• All of these are libraries in C/C++
Multi-Processing:
Beaglebone
• Processor
• 720 MHz ARM Cortex-A8
• 3D graphics accelerator
• ARM Cortex-M3 for power
management
• 2x Programmable Realtime Unit
RISC CPUs
• PRUs share memory space with
A8
Shared Memory Space
Multi-Processing:
Custom with Message Passing
• Designate a processor for each frequent
tasks
• Send messages to "Boss" as necessary
• Since every processor's workload is
minimal, slower and low power chips can
be used
• Overall = Same system performance
Message Passing
Problems with Multi-Processing
• Shared Memory Space
• Boards like this are hard to find and
configure
• Message Passing
• Can’t assume messages are received
immediately
Recap
• Go parallel if you want:
• Higher Throughput
• Lower Power
• Two Ways:
• Multi-Threading – Spin
•
Speed up one Application
• Multi-Processing – Beaglebone
•
Do more tasks at same time
• Don’t forget Amdahl’s Law!
Questions