Transcript Slide 1

Welcome
All About Multi Core
and
Parallel Computing
About This Day
•
•
•
•
•
Pacific Software
Asaf Shelly
Objectives
Audience
http://AsyncOp.com
Intel
Brown Belt
What is Multi - Core
Pentium
Pentium Processor
Dual Core
Pentium
Pentium
Quad Core
Why Multi - Core
Performance
Power
100%
2 GHz
Why Multi - Core
174%
113%
2.4 GHz
Performance
Power
100%
2 GHz
100%
Why Multi - Core
174%
113%
Performance
Power
100%
100%
87%
50%
2.4 GHz
2 GHz
1.6 GHz
Why Multi - Core
174%
174%
113%
2.4 GHz
100%
2 GHz
Performance
Power 174%
100%
1.6 GHz
100%
Advantage Of Multi Core
•
•
•
•
•
Low Power Consumption
Lower Heating
Smaller Devices
Light Device and Power Supply
Software Replaces Custom Hardware!
Challenges With Multi-Core
•
•
•
•
Single Process
Locks
Memory
Renaissance
Problems & Misconceptions
•
•
•
•
•
•
Critical Section
Spaghetti Flow
User Interface
STL
Design Patterns
Tools
Serial 1 - Image Processing
Image Processing
< Serial >
Language Extensions
•
•
•
•
•
OpenMP
.Net Parallel Extensions
.Net 4.0
Parallel Loops
Task Management
C# Parallel.For
for (int col = startLine; col < stopLine; col++)
{
for (int row = 0; row < src.Width; row++)
{
...
Parallel.For(startLine, stopLine, col =>
{
for (int row = 0; row < width; row++)
{
...
C# Parallel Class
•
•
•
•
•
Parallel.For
Parallel.Foreach
Parallel.Do
Nested Loops
Internal Tasks
PLINQ - Parallel LINQ
Parallel Ext 1 - Image Processing
Image Processing
< Parallel.For >
Parallel Ext 2 - Image Processing Bug Fixes
Image Processing
< Locked Parallel.For >
Locks
• Critical Section
• MUTEX
• Semaphore
Locks: Can You Find The Bug?
If ( Lock ( MUTEX_Read ) )
{
Buffer_Read[12] = 153;
Unlock( MUTEX_Read );
}
Lock = Stop
Lock = Stop
•
•
•
•
•
•
Lock Resource To Single Core
Prevent Parallel Work
Bad User Experience
Unreliable
Too Easy To Implement!
Deadlock
Need Lock-Free Solution
Protecting A Resource
•
•
•
•
•
•
Using a Thread
Ownership
Resource Manager
Device Drivers
Input
Output
Resource Duplication
•
•
•
•
Resource per Thread
Distribute Before Execution
Wait For Thread / Join
Merge After Execution
Parallel 1 - Using Two Threads
Image Processing
< Two Threads >
Parallel 2 - Two Threads Two Resources
Image Processing
< Resource Separation >
Break
All About Multi Core
and
Parallel Computing
Stall time while slide loads
Computer Software
User
Business Logic
Infrastructure
Computer System
Parallel Support ?
Simple Solutions
•
•
•
•
Locks – Deadlock and Slow Work
Lock-Free Solutions – Must Design
Language Extensions – Local are Global!
Multiple Threads – When and How to synchronize
Design, Design, Design
•
•
•
•
•
•
Definition of Task
Task Dependency
Conjunction Points
Resource Ownership
Types Of Events
Priorities
Conjunction Points
Open
Scan
Scan
Scan
Scan
Modify
Modify
Write
Scan
Scan
Simple Example
Serial Work
Parallel Work
Parallel Design Techniques
•
•
•
•
•
•
Queues
Thread-Pool
Core-Pool
Event With Priority
Resource Owners and Managers
Flow Control Techniques
A Good Parallel Application
Example Of A Parallel Application
Example Of A Parallel Application
API – File System
•
•
•
•
•
File System is an Object Store!
Locate Resource By Name
Share Resource With Rules
Real Lock on Resource
Map To Memory Only Under Use
API - Queue
•
•
•
•
•
•
Pass Data Without Using Lock
Full Asynchronous Operation
Event With Data
Event With Priority
Event With Destination
Structured Event vs. Stream
Parallel System
Web Server
SOA: Service Oriented Architecture
SOA: Service Oriented Architecture
SOA System
HPC: High Performance Computing
HPC: High Performance Computing
Break
All About Multi Core
and
Parallel Computing
Stall time while slide loads
Operating System Support
•
•
•
•
•
Parallel Kernel
Embedded Concepts
Old System 30 years +
Parallel User Interface
Event Driven System
OS Queues
•
•
•
•
•
•
Thread Queue
Network Socket
Pipe
Mailslot
Window Message Queue
System APC / DPC Queue
Parallel 3 - Two Threads Async
Image Processing
< Completion Port >
APC: Asynchronous Procedure Call
•
•
•
•
•
Register Callback
Asynchronous Operation
Callback Under Caller’s Context
System Engine For Posting A Callback
System Callback Queue
Thread Pool
•
•
•
•
•
•
Thread Creation Cost
Thread Destruction Cost
Reuse Of Thread As Resource
Pool Inflation / Deflation
Deadlock
Core Pool
Parallel 4 - Using A Thread Pool
Image Processing
< Thread Pool >
Thread Pool Tasks
•
•
•
•
•
•
Thread Pool Queue
Break Operations Into Mini-Tasks
Break Resources
Task Chains
Tasks That Spawn Tasks
Thread Priority
Parallel 5 - Multiple Tasks
Image Processing
< Mini-Tasks >
TLS: Thread Local Storage
•
•
•
•
•
•
stdio, cout, etc.
MessageBox << “Text” << MessageBox
Lock-Free Alternative For Singleton Pattern
No Sharing – No Collision
Initialize Instance For Every Thread
Thread Local Storage
Thread Stack
Parallel 6 - TLS
Image Processing
< TLS >
The CPU Stack
•
•
•
•
•
Thread Local Storage
Hardware Accelerated Flow Control
Execution Context
System Support – Interrupt Vectors
System Support – Programming Languages
Fork
Fork
•
•
•
•
•
Copy Process
Duplicate Handles
Parent / Child
Wait For Child
System Support Of Execution Flow
OS Task
•
•
•
•
Process – Virtual Memory Protection
Thread – Shared Virtual Memory
Signals
Suspend / Resume
Fibers
•
•
•
•
•
Converted Thread
Under The Thread Context
Multiple Fibers
Manual Scheduling
Fiber Local Storage
Break
All About Multi Core
and
Parallel Computing
Stall time while slide loads
Thread Modeling
•
•
•
•
•
Worker Thread
Waiting Thread
Worker per Core
Single Waiting Thread
Mixed Mode Thread
Operation Modeling
Task: locate items in storage
Task: carry items to build site
Task: use items to build tent
Output Locate
Carry
Use
Wires
Fabric
Pole
Time
Force Duplication
Output Locate
Carry
Use
Wires
Fabric
Pole
Time
Pipeline
Output Locate
Carry
Use
Wires
Fabric
Pole
Time
Force Duplication
•
•
•
•
•
•
Sharing Resources
Flow Barriers
Simple to implement
Simple Affinity
Simple Priority
No Optimization
Pipeline
•
•
•
•
•
•
Resources Ownership
Communication Barriers
Requires Design
Affinity Planning
Priority Planning
Optimization
Aborting Operations
•
•
•
•
•
•
Complete Current Work With Resource
Cleanup
Cancel IO
Termination and Exceptions
Ready For Next Operation
Notification Of Completion
Parallel 7 - Worker & Waiter
Worker & Waiting
Threads With Abort
Parallel User Interface
•
•
•
•
System Always Responsive
User: Human
User: Other System
Abort Irrelevant Operations
Parallel 8 - Parallel User Interface
Serial Application
vs.
Parallel Application
Floating Stack
•
•
•
•
•
Windows NT WDM Model (Stream Drivers)
User Events, Hardware Events
Every Event Has a Stack
Stack is not Hardware Accelerated
Flow Control is not bound by hardware
Floating Stack - Simple
Floating Stack - Split
Floating Stack - Complex
Object Oriented Design
•
•
•
•
•
What The System is
Object Relations
Object Reuse
Object Based Design and Block Diagram
Spaghetti Flow
Procedural Design
•
•
•
•
•
Function as a Procedure
Procedure Relations
Procedure Internal State
Poor Block Diagram
Spaghetti Code
Layers Based Design
Process Definition
Objects Diagram
Phase – State Programming
•
•
•
•
Clear Operation Phase
Clear Object State
Understand System Behavior
Reproduce “Random Bug”
System Design Stages
•
•
•
•
•
•
Identify Operations
Allocate Priorities
Identify Resources and Define Ownership
Define Tasks and Task Relations
System Block Diagram
Methods and Functions
Problems With OOD
•
•
•
•
•
•
Functions of 3 – 4 lines
Execution Phase extracted from Stack
Flow Control Management by Stack
No Language Support For Task Relations
No Language Support For Queues and APCs
No Support for Flow Control in System Modeling
Break
All About Multi Core
and
Parallel Computing
Stall time while slide loads
Future Of Parallel Computing
•
•
•
•
New and Exiting Tools
Biggest Evolution Since Object Oriented
New and Evolving Design Patterns
New Parallel Infrastructures!
Hardware Support
•
•
•
•
Multi-Core CPU
Multi-Memory CPU
NUMA
Core Dedication
Multiple CPU Cores
128 Core Silicon
What About 256 Cores?
Quad-Core is soon to be History
ARM: Quad-Core Cellular Phones
Parallel Memory
Visual Studio 2010
Visual Studio 2010
Windows Server 2008 HPC
Windows Server 2008 HPC
Do Customers even care?
2005
2007
2008
2010
Everything is stopped. Waiting for the photographer
Everyone is working independently
Developers are writing functions
Developers are managing tasks
Doing things the way we always have
Things are going to be different
So What’s Next?
• http://AsyncOp.com
• http://software.intel.com/en-us/blogs/author/asafshelly
• Pacific-Software Training
• http://Pacificsoft.co.il
• Contact Dafna, Eva, or me directly!
Thank You
All About Multi Core
And Parallel Computing