NESCDPAMay31-07 - Community Grids Lab

Download Report

Transcript NESCDPAMay31-07 - Community Grids Lab

Linking Programming models
between Grids, Web 2.0 and
Multicore
Distributed Programming Abstractions
Workshop NESC Edinburgh UK
May 31 2007
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
[email protected]
http://www.infomall.org
1











Points in Talk I
All parallel programming projects more or less fail
All distributed programming projects report success
• There are several hundred in Grid workflow area alone
Few constraints on distributed programming
Composition (in distributed computing) v decomposition (in parallel
computing)
There is not much difference between distributed programming and a key
paradigm of parallel computing (functional parallelism)
Pervasive use of  64 core chips in the future will often require one to build a
Grid on a chip i.e. to execute a traditional distributed application on a chip
XML is a pretty dubious syntax for expressing programs
Web 2.0 is pretty scruffy but there are some large companies and many users
behind it.
Web 2.0 and Grids will converge and features of both will survive or disappear
in merged environment
Web 2.0 has a more plausible approach to distributed programming than Web
Services/Grids
Dominant Distributed Programming models will support Multicore, Web 2.0
2
and Grids

Some
More
points
Services could be universal abstraction in parallel and distributed computing
•




Gateways/Portals (Portlets, Widgets, Gadgets) are natural user (application usage)
interface to a collection of services
Important Data (SQL, WFS, RSS Feeds) abstractions
Divide Parallel Programming Run-time (matching application structure) into 3 or 4
Broad classes
Inter-entity communication time characteristic of different programming model
•

•

Marine corps write libraries in “HLA++”, MPI or dynamic threads (internally one
microsecond latency) expressed as services
Services composed/mashuped by “millions”
Many composition (coordination) or mashup approaches
•
•
•
•
•

1-5 µs for MPI/Thread switching to 1-1000 milliseconds for services on the Grid and 25 µs
for services inside a chip
Multicore Commodity Programming Model
•

Whereas objects could not be universal so perhaps should move away from their use
Functional (cf. Google Map Reduce for data transformations)
Dataflow
Workflow
Visual
Script
The difficulties of making effective use of multicore chips will so great that it will be
main driver of new programming environments
Microsoft CCR DSS is good example of unification of parallel and distributed
computing
Some Details


See http://www.slideshare.net/Foxsden or more
conventionally
Web 2.0 and Grid Tutorial
• http://grids.ucs.indiana.edu/ptliupages/presentations/CTSpar
tIMay21-07.ppt
• http://grids.ucs.indiana.edu/ptliupages/presentations/Web20T
utorial_CTS.ppt

Multicore and Parallel Computing Tutorial
• http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/
index.html

“Web 2.0” citation site
http://www.connotea.org/user/crmc
Web 2.0 and Web Services I





Web Services have clearly defined protocols (SOAP) and a well defined
mechanism (WSDL) to define service interfaces
• There is good .NET and Java support
• The so-called WS-* (WS-Nightmare) specifications provide a rich
sophisticated but complicated standard set of capabilities for security,
fault tolerance, meta-data, discovery, notification etc.
“Narrow Grids” build on Web Services and provide a robust managed
environment with growing adoption in Enterprise systems and distributed
science (e-Science)
We can use the term Grids strictly as Narrow Grids that are collections of
Web Services (or even more strictly OGSA Grids) or just call any collections
of services as “Broad Grids” which actually is quite often done
Web 2.0 supports a similar architecture to Web services but has developed in
a more chaotic but remarkably successful fashion with a service architecture
with a variety of protocols including those of Web and Grid services
• Over 400 Interfaces defined at http://www.programmableweb.com/apis
One can easily combine SOAP (Web Service) based services/systems with
HTTP messages but the “lowest common denominator” suggests additional
structure/complexity of SOAP will not easily survive
Web 2.0 and Web Services II





Web 2.0 also has many well known capabilities with Google
Maps and Amazon Compute/Storage services of clear general
relevance
There are also Web 2.0 services supporting novel collaboration
modes and user interaction with the web as seen in social
networking sites and portals such as: MySpace, YouTube,
Connotea, Slideshare ….
I once thought Web Services were inevitable but this is no longer
clear to me
Web services are complicated, slow and non functional
• WS-Security is unnecessarily slow and pedantic
(canonicalization of XML)
• WS-RM (Reliable Messaging) seems to have poor adoption
and doesn’t work well in collaboration
• WSDM (distributed management) specifies a lot
There are de facto standards like Google Maps and powerful
suppliers like Google which “define the rules”
Attack of the Killer Multicores





Today commodity Intel systems are sold with 8 cores spread over
two processors
Specialized chips such as GPU’s and IBM Cell processor have
substantially more cores
Moore’s Law implies and will be satisfied by and imply
exponentially increasing number of cores doubling every 1.5-3
Years
• Modest increase in clock speed
• Intel has already prototyped a 80 core Server chip ready in
2011?
Huge activity in parallel computing programming (recycled from
the past?)
• Some programming models and application styles similar to
Grids
We will have a Grid on a chip …………….
7
Grids meet Multicore Systems





The expected rapid growth in the number of cores per chip has
important implications for Grids
With 16-128 cores on a single commodity system 5 years from
now one will both be able to build a Grid like application on a
chip and indeed must build such an application to get the
Moore’s law performance increase
• Otherwise you will “waste” cores …..
One will not want to reprogram as you move your application
from a 64 node cluster or transcontinental implementation to a
single chip Grid
However multicore chips have a very different architecture from
Grids
• Shared not Distributed Memory
• Latencies measured in microseconds not milliseconds
Thus Grid and multicore technologies will need to “converge”
and converged technology model will have different
requirements from current Grid assumptions
8
Grid versus Multicore Applications

It seems likely that future multicore applications will
involve a loosely coupled mix of multiple modules that
fall into three classes
• Data access/query/store
• Analysis and/or simulation
• User visualization and interaction



This is precisely mix that Grids support but Grids of
course involve distributed modules
Grids and Web 2.0 use service oriented architectures to
describe system at module level – is this appropriate
model for multicore programming?
Where do multicore systems get their data from?
9
RMS: Recognition Mining Synthesis
Recognition
Mining
Synthesis
What is …?
Is it …?
What if …?
Model
Find a model
instance
Create a model
instance
Today
Model-less
Real-time streaming and
transactions on
static – structured datasets
Very limited realism
Tomorrow
Model-based
multimodal
recognition
Real-time analytics on
dynamic, unstructured,
multimodal datasets
Photo-realism and
physics-based
animation
Intel has probably most sophisticated analysis of
future “killer” multicore applications –
they are “just” standard Grid and parallel computing
Pradeep K. Dubey, [email protected]
10
Recognition
What is a tumor?
Mining
Synthesis
Is there a tumor here?
What if the tumor progresses?
It is all about dealing efficiently with complex multimodal datasets
Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html
Pradeep K. Dubey, [email protected]
11
Intel’s Application Stack
PC07Intro [email protected]
12
Role of Data in Grid/Multicore I


One typically is told to place compute (analysis) at the
data but most of the computing power is in multicore
clients on the edge
These multicore clients can get data from the internet
i.e. distributed sources
• This could be personal interests of client and used by client to
help user interact with world
• It could be cached or copied
• It could be a standalone calculation or part of a distributed
coordinated computation (SETI@Home)

Or they could get data from set of local sensors (videocams and environmental sensors) naturally stored on
client or locally to client
13
Role of Data in Grid/Multicore

Note that as you increase sophistication of data
analysis, you increase ratio of compute to I/O
• Typical modern datamining approach like Support Vector
Machine is sophisticated (dense) matrix algebra and not just
text matching
• http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/PC07BYOPA.ppt

Time complexity of Sophisticated data analysis will
make it more attractive to fetch data from the Internet
and cache/store on client
• It will also help with memory bandwidth problems in
multicore chips

In this vision, the Grid “just” acts as a source of data
and the Grid application runs locally
14
Multicore Programming Paradigms
• At a very high level, there are three or four broad classes of
parallelism
• Coarse grain functional parallelism typified by workflow
and often used to build composite “metaproblems” whose
parts are also parallel
– “Compute-File”, Database/Sensor, Community, Service, Pleasing
Parallel (Master-worker) are sub-classses
• Large Scale loosely synchronous data parallelism where
dynamic irregular work has clear synchronization points as
in most large scale scientific and engineering problems
• Fine grain (asynchronous) thread parallelism as used in
search algorithms which are often data parallel (over
choices) but don’t have universal synchronization points
• Discrete Event Simulations are either a fourth class or a
variant of thread parallelism
PC07Intro [email protected]
15
Data Parallel Time Dependence
• A simple form of data parallel applications are synchronous with all elements
of the application space being evolved with essentially the same instructions
• Such applications are suitable for SIMD computers and run well on vector
supercomputers (and GPUs but these are more general than just
synchronous)
• However synchronous applications also run fine on MIMD machines
• SIMD CM-2 evolved to MIMD CM-5 with same data parallel language
CMFortran
• The iterative solutions to Laplace’s equation are synchronous as are many full
matrix algorithms
Application Time
Synchronous
Synchronization on MIMD
machines is accomplished
by messaging
It is automatic on SIMD
machines!
t4
t3
t2
t1
t0
Application Space
Identical evolution algorithms
MPI_SENDRECV is typical primitive
Processors do a send followed by a receive or a receive followed by a send
In two stages (needed to avoid race conditions), one has a complete left shift
Often follow by equivalent right shift, do get a complete exchange
This logic guarantees correctly updated data is sent to processors that have their data at same
simulation time
Application and Processor Time
………
•
•
•
•
•
Local Messaging for Synchronization
Communication
Phase
Compute
Phase
Communication
Phase
Compute
Phase
Communication
Phase
Compute
Phase
8 Processors
Communication
Phase
Application Space
Loosely Synchronous Applications
• This is most common large scale science and engineering
and one has the traditional data parallelism but now
each data point has in general a different update
– Comes from heterogeneity in problems that would be
synchronous if homogeneous
• Time steps typically uniform but sometimes need to support variable time steps
across application space – however ensure small time steps are t = (t1t0)/Integer so subspaces with finer time steps do synchronize with full domain
Application Time
• The time synchronization via
messaging is still valid
• However one no longer load
balances (ensure each processor
does equal work in each time
step) by putting equal number
of points in each processor
• Load balancing although NP
complete is in practice
surprisingly easy
t4
t3
t2
t1
t0
Application Space
Distinct evolution algorithms for
each data point in each processor
MPI Futures?
• MPI likely to become more important as
multicore systems become more common
• Should use MPI when MPI needed and use other
messaging for other cases (such as linking
services) where different features/performance
appropriate
• MPI has too many primitives which will
handicap broad implementation/adoption
• Perhaps only have one collective primitive like
CCR which allows general collective operations
to be built by user
Fine Grain Dynamic Applications
• Here there is no natural universal ‘time’ as there is in science
algorithms where an iteration number or Mother Nature’s time
gives global synchronization
• Loose (zero) coupling or special features
of application needed for successful
parallelization
• In computer chess, the minimax scores
at parent nodes provide multiple
dynamic synchronization points
Application Time
Application Time
Application Space
Application Space
Computer Chess
• Thread level parallelism unlike
position evaluation parallelism
used in other systems
• Competed with poor reliability
and results in 1987 and 1988
ACM Computer Chess
Championships
Increasing
search depth
Discrete Event Simulations
• These are familiar in military and circuit (system) simulations
when one uses macroscopic approximations
– Also probably paradigm of most multiplayer Internet games/worlds
• Note Nature is perhaps synchronous when viewed quantum
mechanically in terms of uniform fundamental elements (quarks
and gluons etc.)
• It is loosely synchronous when considered in terms of particles
and mesh points
Battle of Hastings
• It is asynchronous
when viewed in
terms of tanks,
people, arrows etc.
• Circuit simulations
can be done loosely
synchronously but
inefficient as many
inactive elements
Programming Models
• The three major models are supported by HPCS languages which
are very interesting but too monolithic
• So the Fine grain thread parallelism and Large Scale loosely
synchronous data parallelism styles are distinctive to parallel
computing while
• Coarse grain functional parallelism of multicore overlaps with
workflows from Grids and Mashups from Web 2.0
• Seems plausible that a more uniform approach evolve for coarse
grain case although this is least constrained of programming
styles as typically latency issues are not critical
– Multicore would have strongest performance constraints
– Web 2.0 and Multicore the most important usability constraints
• A possible model for broad use of multicores is that the difficult
parallel algorithms are coded as libraries (Fine grain thread
parallelism and Large Scale loosely synchronous data parallelism
styles) while the general user uses composes with visual interfaces,
scripting and systems like Google MapReduce
Google MapReduce
Simplified Data Processing on Large Clusters
• http://labs.google.com/papers/mapreduce.html
• This is a dataflow model between services where services can do useful
document oriented data parallel applications including reductions
• The decomposition of services onto cluster engines is automated
• The large I/O requirements of datasets changes efficiency analysis in favor of
dataflow
• Services (count words in example) can obviously be extended to general
parallel applications
• There are many alternatives to language expressing either dataflow and/or
parallel operations and indeed one should support multiple languages in spirit
of services
PC07Intro [email protected]
24
Programming Models
• The services and objects in distributed
computing are usually “natural” (come from
application) whereas parts connected by MPI (or
created by parallelizing compiler) come from
“artificial” decompositions and not naturally
considered services
• Services in multicore (parallel computing) are
original modules before decomposition and its
these modules that coarse grain functional
parallelism addresses
• Most of “difficult” issues in parallel computing
concern treatment of decomposition
Parallel Software Paradigms: Top Level
• In the conventional two-level Grid/Web Service
programming model, one programs each
individual service and then separately programs
their interaction
– This is Grid-aware Services programming model
– SAGA supports Grid-aware programs?
• This is generalized to multicore with “Marine
Corps” programming services for “difficult”
cases
– Loosely Synchronous
– Fine Grain threading
– Discrete Event Simulation
The Marine Corps Lack of
Programming Paradigm Library Model
• One could assume that parallel computing is “just too hard
for real people” and assume that we use a Marine Corps of
programmers to build as libraries excellent parallel
implementations of “all” core capabilities
– e.g. the primitives identified in the Intel application
analysis
– e.g. the primitives supported in Google MapReduce, HPF,
PeakStream, Microsoft Data Parallel .NET etc.
• These primitives are orchestrated (linked together) by
overall frameworks such as workflow or mashups
• The Marine Corps probably is content with efficient rather
than easy to use programming models
Component Parallel and Program Parallel
• Component parallel paradigm is where one explicitly programs
the different parts of a parallel application with the linkage either
specified externally as in workflow or in components themselves
as in most other component parallel approaches
– In Grids, components are natural
– In Parallel computing, components are produced by decomposition
• In the program parallel paradigm, one writes a single program to
describe the whole application and some combination of compiler
and runtime breaks up the program into the multiple parts that
execute in parallel
• Note that a program parallel approach will often call a built in
runtime library written in component parallel fashion
– A parallelizing compiler could call an MPI library routine
• Could perhaps better call “Program Parallel” as “Implicitly
Parallel” and “Component Parallel” as “Explicitly Parallel”
Component Parallel and Program Parallel
• Program Parallel approaches include
– Data structure parallel as in Google MapReduce, HPF (High
Performance Fortran), HPCS (High-Productivity Computing
Systems) or “SIMD” co-processor languages (PeakStream,
ClearSpeed and Microsoft Data Parallel .NET)
– Parallelizing compilers including OpenMP annotation
– Note OpenMP and HPF have failed in some sense for large scale
parallel computing (writing algorithm in standard sequential
languages throws away information needed for parallelization)
• Component Parallel approaches include
– MPI (and related systems like PVM) parallel message passing
– PGAS (Partitioned Global Address Space CAF, UPC, Titanium,
HPJava )
– C++ futures and active objects
– CSP … Microsoft CCR and DSS
– Workflow and Mashups
– Discrete Event Simulation
Why people like MPI!
• Jason J Beech-Brandt, and Andrew A. Johnson, at AHPCRC
Minneapolis
• BenchC is unstructured finite element CFD Solver
• Looked at
OpenMP on
shared memory
Altix with some
After Optimization of UPC
effort to
optimize
• Optimized UPC
on several
machines
• MPI always good
but other
approaches
erratic
• Other studies
reach similar
conclusions?
cluster
cluster
Web 2.0 Systems are Portals, Services, Resources

Captures the incredible development of interactive Web
sites enabling people to create and collaborate
The world does itself in large numbers!
Mashups v Workflow?





Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63
Workflow Tools are reviewed by Gannon and Fox
http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf
Both include
scripting in PHP,
Python, sh etc. as
both implement
distributed
programming at level
of services
Mashups use all
types of service
interfaces and do not
have the potential
robustness (security)
of Grid service
approach
Typically “pure”
HTTP (REST)
32
Web 2.0 APIs


http://www.programmable
web.com/apis has (May 14
2007) 431 Web 2.0 APIs
with GoogleMaps the most
often used in Mashups
This site acts as a “UDDI”
for Web 2.0
The List of
Web 2.0 API’s





Each site has API and
its features
Divided into broad
categories
Only a few used a lot
(42 API’s used in
more than 10
mashups)
RSS feed of new APIs
Amazon S3 growing
in popularity
APIs/Mashups per Protocol Distribution
google
maps
Number of
APIs
Number of
Mashups
del.icio.us
411sync
yahoo! search
yahoo! geocoding
virtual
earth
technorati
netvibes
yahoo! images
trynt
yahoo! local
amazon
ECS
google
search
flickr
SOAP
ebay
youtube
amazon S3
REST
live.com
XML-RPC
REST,
XML-RPC
REST,
XML-RPC,
SOAP
REST,
SOAP
JS
Other
4 more Mashups
each day



Growing number of commercial Mashup Tools
For a total of 1906
April 17 2007 (4.0 a
day over last
month)
Note ClearForest
runs Semantic Web
Services Mashup
competitions (not
workflow
competitions)
Some Mashup
types: aggregators,
search aggregators,
visualizers, mobile,
maps, games
Implication for Grid Technology
of Multicore and Web 2.0 I



Web 2.0 and Grids are addressing a similar application
class although Web 2.0 has focused on user interactions
• So technology has similar requirements
Multicore differs significantly from Grids in component
location and this seems particularly significant for data
• Not clear therefore how similar applications will be
• Intel RMS multicore application class pretty similar
to Grids
Multicore has more stringent software requirements
than Grids as latter has intrinsic network overhead
37
Implication for Grid Technology
of Multicore and Web 2.0 II

Multicore chips require low overhead protocols to
exploit low latency that suggests simplicity
• We need to simplify MPI AND Grids!


Web 2.0 chooses simplicity (REST rather than SOAP)
to lower barrier to everyone participating
Web 2.0 and Multicore tend to use traditional (possibly
visual) (scripting) languages for equivalent of workflow
whereas Grids use visual interface backend recorded in
BPEL
• Google MapReduce illustrates a popular Web 2.0
and Multicore approach to dataflow
38
Implication for Grid Technology
of Multicore and Web 2.0 III


Web 2.0 and Grids both use SOA Service Oriented
Architectures
• Seems likely that Multicore will also adopt although a more
conventional object oriented approach also possible
• Services should help multicore applications integrate
modules from different sources
• Multicore will use fine grain objects but coarse grain
services
“System of Systems”: Grids, Web 2.0 and Multicore are likely
to build systems hierarchically out of smaller systems
• We need to support Grids of Grids, Webs of Grids, Grids
of Multicores etc. i.e. systems of systems of all sorts
39
The Ten areas covered by the 60 core WS-*
Specifications
WS-* Specification Area
Typical Grid/Web Service Examples
1: Core Service Model
XML, WSDL, SOAP
2: Service Internet
WS-Addressing, WS-MessageDelivery; Reliable
Messaging WSRM; Efficient Messaging MOTM
3: Notification
WS-Notification, WS-Eventing (PublishSubscribe)
4: Workflow and Transactions
BPEL, WS-Choreography, WS-Coordination
5: Security
WS-Security, WS-Trust, WS-Federation, SAML,
WS-SecureConversation
6: Service Discovery
UDDI, WS-Discovery
7: System Metadata and State
WSRF, WS-MetadataExchange, WS-Context
8: Management
WSDM, WS-Management, WS-Transfer
9: Policy and Agreements
WS-Policy, WS-Agreement
10: Portals and User Interfaces
WSRP (Remote Portlets)
WS-* Areas and Web 2.0
WS-* Specification Area
Web 2.0 Approach
1: Core Service Model
XML becomes optional but still useful
SOAP becomes JSON RSS ATOM
WSDL becomes REST with API as GET PUT etc.
Axis becomes XmlHttpRequest
2: Service Internet
No special QoS. Use JMS or equivalent?
3: Notification
Hard with HTTP without polling– JMS perhaps?
4: Workflow and Transactions
(no Transactions in Web 2.0)
Mashups, Google MapReduce
Scripting with PHP JavaScript ….
5: Security
SSL, HTTP Authentication/Authorization,
OpenID is Web 2.0 Single Sign on
6: Service Discovery
http://www.programmableweb.com
7: System Metadata and State
Processed by application – no system state –
Microformats are a universal metadata approach
8: Management==Interaction
WS-Transfer style Protocols GET PUT etc.
9: Policy and Agreements
Service dependent. Processed by application
10: Portals and User Interfaces Start Pages, AJAX and Widgets(Netvibes) Gadgets
WS-* Areas and Multicore
WS-* Specification Area
Typical Grid/Web Service Examples
1: Core Service Model
Fine grain Java C# C++ Objects and coarse grain
services as in DSS. Information passed explicitly
or by handles. MPI needs to be updated to handle
non scientific applications as in CCR
2: Service Internet
Not so important intrachip
3: Notification
Publish-Subscribe for events and Interrupts
4: Workflow and Transactions
Many approaches; scripting languages popular
5: Security
Not so important intrachip
6: Service Discovery
Use libraries
7: System Metadata and State
Environment Variables
8: Management == Interaction
Interaction between objects key issue in parallel
programming trading off efficiency versus
performance
9: Policy and Agreements
Handled by application
10: Portals and User Interfaces
Web 2.0 technology popular
CCR as an example of a Cross Paradigm
Run Time
• Naturally supports fine grain thread switching
with message passing with around 4 microsecond
latency for 4 threads switching to 4 others on an
AMD PC with C#. Threads spawned – no
rendezvous
• Has around 50 microsecond latency for coarse
grain service interactions with DSS extension
which supports Web 2.0 style messaging
• MPI Collectives – Shift and Exchange vary from
10 to 20 microsecond latency in rendezvous mode
• Not as good as best MPI’s but managed code and
supports Grids Web 2.0 and Parallel Computing
Microsoft CCR
• Supports exchange of messages between threads using named
ports
• FromHandler: Spawn threads without reading ports
• Receive: Each handler reads one item from a single port
• MultipleItemReceive: Each handler reads a prescribed number of
items of a given type from a given port. Note items in a port can
be general structures but all must have same type.
• MultiplePortReceive: Each handler reads a one item of a given
type from multiple ports.
• JoinedReceive: Each handler reads one item from each of two
ports. The items can be of different type.
• Choice: Execute a choice of two or more port-handler pairings
• Interleave: Consists of a set of arbiters (port -- handler pairs) of 3
types that are Concurrent, Exclusive or Teardown (called at end
for clean up). Concurrent arbiters are run concurrently but
exclusive handlers are
• http://msdn.microsoft.com/robotics/
PC07Intro [email protected]
44
Rendezvous exchange
as two shifts
Latency/Overhead
Rendezvous exchange customized for MPI
Rendezvous Shift
25
20
Time
15
AMDExch
Time Microseconds
AMD Exch as
AMD Shift
10
5
Stages (millions)
0
0
2
4
6
8
10
Millions
Overhead (latency) of AMD 4-core PC with 4 execution threads on MPI style
Stages
Rendezvous Messaging for Shift and
Exchange implemented either as two shifts or as
custom CCR pattern. Compute time is 10 seconds divided by number of stages
Latency/Overhead up to million stages
Rendezvous exchange as two shifts
Rendezvous exchange customized for MPI
Rendezvous Shift
90
80
Time Microseconds
INTELEX
INTEL Ex
INTEL Sh
70
Time
60
50
40
30
20
10
Stages (millions)
0
0
0.2
0.4
0.6
0.8
1
Millions
threads
on MPI
Overhead (latency) of INTEL 8-core PC with 8 execution
style
Rendezvous Messaging for Shift and
Exchange implemented either as two shifts or as
Stages
custom CCR pattern. Compute time is 15 seconds divided by number of stages
Average run time (microseconds)
350
DSS Service Measurements
300
250
200
150
100
50
0
1
10
100
1000
10000
Timing of HP Opteron Multicore as aRound
functiontrips
of number of simultaneous twoway service messages processed (November 2006 DSS Release)

CGL Measurements of Axis 2 shows about 500 microseconds – DSS is 10 times better
PC07Intro [email protected]
47