Advanced Processor Technologies
Download
Report
Transcript Advanced Processor Technologies
New Architectures Need
New Languages
A triumph of optimism over
experience!
Ian Watson
3rd July 2009
1
‘Obvious Truths’
• Single processors will not get faster,
we need to go to multi-core
• There will be a need for processors
with many (> 32?) cores
• These will need to support general
purpose applications
• Application performance will need to
scale with number of cores
2
‘Obvious Truths’(2)
• General purpose parallel computing
needs shared memory
• Current shared memory requires
cache coherence
• Cache coherence doesn’t scale
beyond 32 cores
• Updateable state makes general
purpose parallel programming
difficult
3
‘Obvious Untruths’
• HPC already has all the answers to
parallel programming
• Message passing is the answer
(hardware or software or both)
• Conventional languages already have
adequate threading and locking
facilities
• We can program without state
4
So what next?
• Simplifying the programming model
must be the answer – removing
facilities is desirable e.g.
– Random control transfer
– Pointer arithmetic
– Explicit memory reclamation
• Arbitrary state manipulation is the
enemy of parallelism – we must
restrict it!
5
Half Truths?
• Functional languages are the answer
to parallelism, all we need is to add
state (in a controlled way)
• Transactional memory can replace
locking to simplify the handling of
parallel state
• Transactional memory can remove
the need for cache coherence
6
Functions+Transactions
• The Cambridge Microsoft Haskell
work has shown how transactions can
be included in a functional language
via monads
• Is this a style of programming which
can be sold to the world as the way
ahead for future multi-core (manycore) systems?
7
Selling a New Language
• It must capable of expressing
•
•
•
•
everything that people want
It isn’t just a case of producing
something which is a good technical
solution
It mustn’t be too complex
It probably needs to look familiar
It needs to be efficient to implement
8
The Problems
• FP is unfamiliar to many existing
programmers
• Many people find it hard to
understand
• Even more find monads difficult
• In spite of excellent FP compiler
technology, imperative programming
will probably always be more efficient
9
Can We Compromise?
• Pure functional programs can be
executed easily in parallel because
they don’t update global state
• But if we only exploit parallelism at
the function level, local manipulation
of state within a function causes no
problems
• Can we work with such a model?
10
What Would We Gain?
• ‘Easy’ parallelism at function level
– This could either be explicit or implicit
• Familiarity of low level code
– Can use iteration, assignment,
updateable arrays etc.
• Potential increase in efficiency
– Direct mapping to machine code
– Explicit memory re-use
11
What Would We Lose?
• Clearly we lose referential
transparency within any imperative
code
• But this is inevitable if we want to
manipulate state – even with monads
• Clearly, as described so far, we
haven’t got the ability to manipulate
global state – we need more
12
Adding Transactions
• We should only use shared state
when it is really necessary
• It should be clear in the language
when this is happening
• It should be detectable statically
• Ideally, it should be possible to check
automatically the need for atomic
sections
13
Memory Architecture
• With the right underlying
programming model we should be
able to determine memory regions
– Read only
– Thread local
– Global write once
– Global shared (transactional)
• Can lead to simplified scalable
memory architecture
14
Experiments
• Using Scala to investigate
programming styles
– Is open source
– Has both imperative & functional feature
– Not currently transactional
• Using Simics based hardware
simulator to experiment with memory
architectures
15
Outstanding Questions
• Data Parallelism
– How to express
– How to handle in-place update of parallel
data (array) structures
• Streaming applications
– Purely functional?
– Need message passing constructs?
– Need additions to the memory model?
16
Conclusions
• None really so far!
• But am convinced, from a technical
viewpoint, we need new
programming approaches
• Am fairly convinced that we need to
be pragmatic in order to sell a new
approach, even if this requires
compromises from ideals
17
Questions?
18
Transactional Memory
• Programming model to simplify
manipulation of shared state
– Speculative model
– Sections of program declared ‘atomic’
– They must complete without conflict or
die and restart
– Must not alter global state until complete
– Needs system support – software or
hardware
19
Object Based Transactional
Memory Hardware
• Based on ‘object-aware’ caches
• Exploits object structure to simplify
transactional memory operations
• Advantages over other hardware TM
proposals
– Handles cache overflow elegantly
– Enables multiple memory banks with
distributed commit
20
TM & Cache Coherence
• Fine grain cache coherence is the major
impediment to extensible multi-cores
• Updates to shared memory only occur
when a transaction commits
• Caches only need to be updated at commit
points (which tend to be coarser grain)
• If all shared memory is made
transactional, the requirement for fine
grain coherence is removed
21
TM Programming
• TM constructs can be added to
conventional programming languages
• But, they require careful use to
ensure correctness
• If transactional & non-transactional
operations are allowed on the same
data, the result can become complex
to understand.
22
New Programming Models?
• Problems can often be simplified by
restricting (unnecessary)
programming facilities e.g.
– Arbitrary control transfer
– Pointer arithmetic
– Explicit memory reclamation
• A new approach is needed to simplify
parallel programming & hardware
23
We Need Useable &
Efficient Models
• Shared memory is essential for
general purpose programming
• Message passing (alone) (e.g. MPI,
Occam etc.) is not sufficient
• We need shared updateable state –
e.g. pure functional programming is
not the answer
• The languages need to be simple and
easily implementable
24
A Synthesis?
• Functional Programming has
something to offer – don’t use state
unnecessarily
• But don’t be too ‘religious’ – local,
single threaded state is simple &
efficient
• Can all global shared state be
handled transactionally?
25
Experiments
• Using the language Scala – has both
functional and imperative features
• Experimenting with applications
• Studying how techniques similar to
‘escape analysis’ can identify shared
mutable state
• Looking at hardware implications,
particularly memory architecture
26