Fundamental Concepts I - University of Virginia

Download Report

Transcript Fundamental Concepts I - University of Virginia

COMPUTER ARCHITECTURE
CS 6354
Fundamental Concepts:
Computing Models
Samira Khan
University of Virginia
Jan 26, 2016
The content and concept of this course are adapted from CMU ECE 740
AGENDA
• Review from last lecture
• Why study computer architecture?
• Fundamental concepts
– Computing models
2
LAST LECTURE RECAP
• What it means/takes to be a good (computer) architect
– Roles of a computer architect (look everywhere!)
• Levels of transformation
• Abstraction layers, their benefits, and the benefits of
comfortably crossing them
• Two example problems and solution ideas
– Solving DRAM Scaling with system-level detection and
mitigation
– Merging memory and storage with non-volatile memories
• Course Logistics
• Assignments: HW (today), Review Set 1 (Saturday)
3
REVIEW: KEY TAKEAWAY
• Breaking the abstraction layers (between
components and transformation hierarchy
levels) and knowing what is underneath enables
you to solve problems and design better future
systems
• Cooperation between multiple components and
layers can enable more effective solutions and
systems
4
HOW TO DO THE PAPER REVIEWS
• 1: Brief summary
–
–
–
–
What is the problem the paper is trying to solve?
What are the key ideas of the paper? Key insights?
What is the key contribution to literature at the time it was written?
What are the most important things you take out from it?
• 2: Strengths (most important ones)
– Does the paper solve the problem well?
• 3: Weaknesses (most important ones)
– This is where you should think critically. Every paper/idea has a weakness. This
does not mean the paper is necessarily bad. It means there is room for
improvement and future research can accomplish this.
• 4: Can you do (much) better? Present your thoughts/ideas.
• 5: What have you learned/enjoyed/disliked in the paper? Why?
• Review should be short and concise (~half a page to a page)
5
AGENDA
• Review from last lecture
• Why study computer architecture?
• Fundamental concepts
– Computing models
6
AN ENABLER: MOORE’S LAW
Moore, “Cramming more components onto integrated circuits,”
Electronics Magazine, 1965.
Component counts double every other year
Image source: Intel
7
Number of transistors on an integrated circuit doubles ~ every two years
Image source: Wikipedia
8
RECOMMENDED READING
• Moore, “Cramming more components onto integrated
circuits,” Electronics Magazine, 1965.
• Only 3 pages
• A quote:
“With unit cost falling as the number of components per
circuit rises, by 1975 economics may dictate squeezing as
many as 65 000 components on a single silicon chip.”
• Another quote:
“Will it be possible to remove the heat generated by tens
of thousands of components in a single silicon chip?”
9
WHAT DO WE USE THESE
TRANSISTORS FOR?
• Your readings for this week should give you an
idea…
• Patt, “Requirements, Bottlenecks, and Good
Fortune: Agents for Microprocessor Evolution,”
Proceedings of the IEEE 2001.
• Mutlu and Subramanium, “Research Problems
and Opportunities in Memory Systems,”
SUPERFRI 2015.
10
WHY STUDY COMPUTER ARCHITECTURE?
• Enable better systems: make computers faster,
cheaper, smaller, more reliable, …
– By exploiting advances and changes in underlying technology/circuits
• Enable new applications
– Life-like 3D visualization 20 years ago?
– Virtual reality?
– Personalized genomics? Personalized medicine?
• Enable better solutions to problems
– Software innovation is built into trends and changes in computer architecture
• > 50% performance improvement per year has enabled this innovation
• Understand why computers work the way they do
11
COMPUTER ARCHITECTURE TODAY (I)
• Today is a very exciting time to study computer
architecture
• Industry is in a large paradigm shift (to multi-core and
beyond) – many different potential system designs possible
• Many difficult problems motivating and caused by the shift
–
–
–
–
–
–
–
Power/energy constraints  multi-core?
Complexity of design  multi-core?
Difficulties in technology scaling  new technologies?
Memory wall/gap
Reliability wall/issues
Programmability wall/problem
Huge hunger for data and new data-intensive applications
• No clear, definitive answers to these problems
12
COMPUTER ARCHITECTURE TODAY (II)
• These problems affect all parts of the computing stack – if
we do not change the way we design systems
Many new demands
from the top
(Look Up)
Problem
Algorithm
Program/Language
Runtime System
(VM, OS, MM)
User
Fast changing
demands and
personalities
of users
(Look Up)
ISA
Microarchitecture
Many new issues
at the bottom
(Look Down)
Logic
Circuits
Electrons
• No clear, definitive answers to these problems
13
COMPUTER ARCHITECTURE TODAY (III)
• Computing landscape is very different from 10-20 years ago
• Both UP (software and humanity trends) and DOWN (technologies
and their issues), FORWARD and BACKWARD, and the resulting
requirements and constraints
Hybrid Main Memory
Persistent Memory/Storage
Heterogeneous
Processors and
Accelerators
Every component and its
interfaces, as well as
entire system designs
are being re-examined
General Purpose GPUs
14
COMPUTER ARCHITECTURE TODAY (IV)
• You can revolutionize the way computers are built, if you
understand both the hardware and the software (and
change each accordingly)
• You can invent new paradigms for computation,
communication, and storage
• Recommended book: Thomas Kuhn, “The Structure of
Scientific Revolutions” (1962)
– Pre-paradigm science: no clear consensus in the field
– Normal science: dominant theory used to explain/improve
things (business as usual); exceptions considered anomalies
– Revolutionary science: underlying assumptions re-examined
15
COMPUTER ARCHITECTURE TODAY (IV)
• You can revolutionize the way computers are built, if you
understand both the hardware and the software (and
change each accordingly)
• You can invent new paradigms for computation,
communication, and storage
• Recommended book: Thomas Kuhn, “The Structure of
Scientific Revolutions” (1962)
– Pre-paradigm science: no clear consensus in the field
– Normal science: dominant theory used to explain/improve
things (business as usual); exceptions considered anomalies
– Revolutionary science: underlying assumptions re-examined
16
… BUT, FIRST …
• Let’s understand the fundamentals…
• You can change the world only if you understand
it well enough…
– Especially the past and present dominant paradigms
– And, their advantages and shortcomings – tradeoffs
– And, what remains fundamental across generations
– And, what techniques you can use and develop to
solve problems
17
AGENDA
• Review from last lecture
• Why study computer architecture?
• Fundamental concepts
– Computing models
18
WHAT IS A COMPUTER?
• Three key components
• Computation
• Communication
• Storage (memory)
19
WHAT IS A COMPUTER?
Processing
control
(sequencing)
Memory
(program
and data)
I/O
datapath
20
THE VON NEUMANN
MODEL/ARCHITECTURE
• Also called stored program computer (instructions in
memory). Two key properties:
• Stored program
– Instructions stored in a linear memory array
– Memory is unified between instructions and data
• The interpretation of a stored value depends on the control
signals When is a value interpreted as an instruction?
• Sequential instruction processing
– One instruction processed (fetched, executed, and completed) at a time
– Program counter (instruction pointer) identifies the current instr.
– Program counter is advanced sequentially except for control transfer
instructions
21
THE VON NEUMANN
MODEL/ARCHITECTURE
• Recommended reading
– Burks, Goldstein, Von Neumann, “Preliminary discussion of the logical
design of an electronic computing instrument,” 1946.
• Stored program
• Sequential instruction processing
22
THE VON NEUMANN MODEL
(OF A COMPUTER)
MEMORY
Mem Addr Reg
Mem Data Reg
PROCESSING UNIT
INPUT
OUTPUT
ALU
TEMP
CONTROL UNIT
IP
Inst Register
23
THE VON NEUMANN MODEL
(OF A COMPUTER)
• Q: Is this the only way that a computer can
operate?
• A: No.
• Qualified Answer: But, it has been the dominant
way
– i.e., the dominant paradigm for computing
– for N decades
24
THE DATA FLOW MODEL
(OF A COMPUTER)
• Von Neumann model: An instruction is fetched and
executed in control flow order
– As specified by the instruction pointer
– Sequential unless explicit control flow instruction
• Dataflow model: An instruction is fetched and executed in
data flow order
– i.e., when its operands are ready
– i.e., there is no instruction pointer
– Instruction ordering specified by data flow dependence
• Each instruction specifies “who” should receive the result
• An instruction can “fire” whenever all operands are received
– Potentially many instructions can execute at the same time
• Inherently more parallel
25
VON NEUMANN VS DATAFLOW
• Consider a Von Neumann program
– What is the significance of the program order?
– What is the significance of the storage locations?
a
b
v <= a + b;
w <= b * 2;
x <= v - w
y <= v + w
z <= x * y
+
*2
-
+
Sequential
*
Dataflow
z
• Which model is more natural to you as a programmer?
26
MORE ON DATA FLOW
• In a data flow machine, a program consists of
data flow nodes
– A data flow node fires (fetched and executed) when
all it inputs are ready
• i.e. when all inputs have tokens
• Data flow node and its ISA representation
27
DATA FLOW NODES
28
AN EXAMPLE DATA FLOW PROGRAM
OUT
29
CONTROL FLOW VS. DATA FLOW
30
Dataflow Machine:
Instruction Templates
b
a
1
2
3
4
5
+
*
+
*
3L
3R
4L
4R
1
2
+
*7
x
5L
5R
out
3
y
4
-
+
Presence bits
Each arc in the graph has an
operand slot in the program
5
*
Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974.
31
DATA FLOW SUMMARY
• Availability of data determines order of execution
• A data flow node fires when its sources are ready
• Programs represented as data flow graphs (of nodes)
• Data Flow at the ISA level has not been (as) successful
• Data Flow implementations under the hood (while
preserving sequential ISA semantics) have been successful
– Out of order execution
– Hwu and Patt, “HPSm, a high performance restricted data
flow architecture having minimal functionality,” ISCA 1986.
32
DATA FLOW CHARACTERISTICS
• Data-driven execution of instruction-level
graphical code
– Nodes are operators
– Arcs are data (I/O)
– As opposed to control-driven execution
• Only real dependencies constrain processing
• No sequential I-stream
– No program counter
• Operations execute asynchronously
• Execution triggered by the presence of data
33
DATA FLOW
ADVANTAGES/DISADVANTAGES
• Advantages
– Very good at exploiting irregular parallelism
– Only real dependencies constrain processing
• Disadvantages
– Debugging difficult (no precise state)
• Interrupt/exception handling is difficult (what is precise state
semantics?)
– Implementing dynamic data structures difficult in pure data
flow models
– Too much parallelism? (Parallelism control needed)
– High bookkeeping overhead (tag matching, data storage)
– Instruction cycle is inefficient (delay between dependent
instructions), memory locality is not exploited
34
ANOTHER WAY OF
EXPLOITING PARALLELISM
• SIMD:
– Concurrency arises from performing the same
operations on different pieces of data
• MIMD:
– Concurrency arises from performing different
operations on different pieces of data
• Control/thread parallelism: execute different threads of
control in parallel  multithreading, multiprocessing
– Idea: Use multiple processors to solve a problem
35
FLYNN’S TAXONOMY OF COMPUTERS
• Mike Flynn, “Very High-Speed Computing Systems,” Proc. of
IEEE, 1966
• SISD: Single instruction operates on single data element
• SIMD: Single instruction operates on multiple data elements
– Array processor
– Vector processor
• MISD: Multiple instructions operate on single data element
– Closest form: systolic array processor, streaming processor
• MIMD: Multiple instructions operate on multiple data
elements (multiple instruction streams)
– Multiprocessor
– Multithreaded processor
36
SIMD PROCESSING
• Concurrency arises from performing the same
operations on different pieces of data
– Single instruction multiple data (SIMD)
– E.g., dot product of two vectors
• Contrast with thread (“control”) parallelism
– Concurrency arises from executing different threads of control in parallel
• Contrast with data flow
– Concurrency arises from executing different operations in parallel (in a
data driven manner)
• SIMD exploits instruction-level parallelism
– Multiple instructions concurrent: instructions happen to be the same
37
SIMD PROCESSING
• Single instruction operates on multiple data
elements
– In time or in space
• Multiple processing elements
• Time-space duality
– Array processor: Instruction operates on multiple
data elements at the same time
– Vector processor: Instruction operates on multiple
data elements in consecutive time steps
38
ARRAY VS. VECTOR PROCESSORS
ARRAY PROCESSOR
Instruction Stream
VECTOR PROCESSOR
Same op @ same time
LD VR  A[3:0]
ADD VR  VR, 1
MUL VR  VR, 2
ST A[3:0]  VR
Different ops @ time
LD0
LD1
LD2
LD3
LD0
AD0
AD1
AD2
AD3
LD1
AD0
MU0 MU1 MU2
MU3
LD2
AD1
MU0
ST3
LD3
AD2
MU1 ST0
AD3
MU2 ST1
ST0
ST1
ST2
Different ops @ same space
MU3 ST2
ST3
Same op @ space
Time
Space
Space
39
SCALAR PROCESSING
• Conventional form of processing (von Neumann
model)
add r1, r2, r3
40
SIMD ARRAY PROCESSING
• Array processor
41
VLIW PROCESSING
• Very Long Instruction Word
– We will get back to this later
42
VECTOR PROCESSORS
• A vector is a one-dimensional array of numbers
• Many scientific/commercial programs use vectors
for (i = 0; i<=49; i++)
C[i] = (A[i] + B[i]) / 2
• A vector processor is one whose instructions
operate on vectors rather than scalar (single data)
values
• Basic requirements
– Need to load/store vectors  vector registers (contain vectors)
– Need to operate on vectors of different lengths  vector length register
(VLEN)
– Elements of a vector might be stored apart from each other in memory 
vector stride register (VSTR)
• Stride: distance between two elements of a vector
43
VECTOR PROCESSOR ADVANTAGES
+ No dependencies within a vector
– Pipelining, parallelization work well
– Can have very deep pipelines, no dependencies!
+ Each instruction generates a lot of work
– Reduces instruction fetch bandwidth
+ Highly regular memory access pattern
– Interleaving multiple banks for higher memory bandwidth
– Prefetching
+ No need to explicitly code loops
– Fewer branches in the instruction sequence
44
COMPUTER ARCHITECTURE
CS 6354
Fundamental Concepts:
Computing Models
Samira Khan
University of Virginia
Jan 26, 2016
The content and concept of this course are adapted from CMU ECE 740