Computer Organization: Basic Concepts (PPT
Download
Report
Transcript Computer Organization: Basic Concepts (PPT
ENG3380
Computer Organization and
Architecture
“Introduction: History &
Basic Concepts”
Winter 2017
S. Areibi
School of Engineering
University of Guelph
Topics
What is Computer Architecture?
Computer Revolution
History of Computers
1st Generation Computers
2nd Generation Computers
3rd Generation Computers
Von Neumann Architecture
Processors
New Technologies
Summary
With thanks to W. Stallings, Hamacher, J. Hennessy, M. J. Irwin for lecture slide contents
Many slides adapted from the PPT slides accompanying the textbook and CSE331 Course
2
References
I.
II.
III.
“Computer Organization and Design: The
Hardware/Software Interface”, 5th edition, by D.
Patterson and J. Hennessy, Morgan Kaufmann
“Computer Organization and Architecture:
Designing for Performance”, 10th edition,
by William Stalling, Pearson.
“Computer Organization and Embedded
Systems”, 6th Edition, by C. Hamacher, Z.
Vranesic, S. Zaky, N. Manjikian
3
Introduction
4
What is Computer Architecture?
o
o
Computer Architecture is the design of the computer at
the hardware/software interface.
Computer Architecture = Instruction Set Architecture
+ Machine Organization
Computer Architecture
Instruction Set Design
Machine Organization
Computer Interface
Hardware Components
Compiler/System View
Logic Designer’s View
5
What is “Computer Architecture” ?
Computer Architecture =
Instruction Set Architecture +
Computer Organization
Instruction Set Architecture (ISA)
WHAT the computer does (logical view)
Computer Organization
HOW the ISA is implemented (physical view)
Computer Architecture
Computer Organization
•Attributes of a system
visible to the programmer
•Have a direct impact on
the logical execution of a
program
•Hardware details
transparent to the
programmer, control
signals, interfaces between
the computer and
peripherals, memory
technology used
•Instruction set, number of
bits used to represent
various data types, I/O
mechanisms, techniques
for addressing memory
Computer
Architecture
Architectural
attributes
include:
Organizational
attributes
include:
Computer
Organization
•The operational units and
their interconnections that
realize the architectural
specifications
7
Instruction Set Architecture: Definition
o Is a subset of Computer Architecture
o Definition by Amdahl, Blaaw, and Brooks – 1964
“… the attributes of a [computing] system as seen by the programmer, i.e.
the conceptual structure and functional behavior, as distinct from the
organization of the data flows and controls the logic design, and the
physical implementation.”
o ISA, or simply architecture – the abstract interface between the
hardware and the lowest level software that encompasses all the
information necessary to write a machine language program,
including instructions, registers, memory access, I/O, …
Instruction Set Architecture (ISA)
An ISA encompasses …
o Instructions and Instruction Formats
Structure of the control word
Instruction Set (or opcodes) (Simple vs. Complex)
Encoding and representing instructions
o Data Types, Encodings, and Representations
Fixed, Floating Point, …
o Programmable Storage: Registers and Memory
o Addressing Modes: Accessing Instructions and Data
o Implementing Interrupts and exception handling
Instruction Set Architecture: Examples
Critical interface between hardware and software
Standardizes instructions, machine language bit patterns, etc.
Advantage: different implementations of the same architecture
Disadvantage: sometimes prevents using new innovations
Examples
Intel
IBM Power
HP PA-RISC
MIPS
Sun Sparc
Digital Alpha
PowerPC
(versions)
(8086, 80386, Pentium, ...)
(Power 2, 3, 4, 5)
(v1.1, v2.0)
(MIPS I, II, III, IV, V)
(v8, v9)
(v1, v3)
(601, 604, …)
Introduced in
1978
1985
1986
1986
1987
1992
1993
Case Study: MIPS ISA
All instructions are 32-bit wide
Instruction Categories
Registers
Load/Store
R0 - R31
Integer Arithmetic
Jump and Branch
Floating Point
PC
Memory Management
HI
Three Instruction Formats
R-type
Op6
Rs5
Rt5
I-type
Op6
Rs5
Rt5
J-type
Op6
LO
Rd5
sa5
immediate16
immediate26
funct6
Computer Organization
Realization of the Instruction Set Architecture
Characteristics of principal components
Registers, ALUs, FPUs, Caches, ...
Ways in which these components are interconnected
Information flow between components
Means by which such information flow is controlled
Register Transfer Level (RTL) description
Computer Architecture: Organization
Application
Operating
System
Compiler
Firmware
Instruction Set Architecture
Instr. Set Proc.
I/O system
Logic Design
Implementation
& Organization
Circuit Design
Layout
Computer
Architecture
Software
Abstraction Layers
Application
Compiler
Assembler
Operating System
Linker
Loader Scheduler Device Drivers
Instruction Set Architecture (Interface SW/HW)
Hardware
Processor
Memory
I/O System
Datapath & Control Design
Digital Logic Design
Circuit Design
Physical (IC Layout) Design
ISA is at the interface between software and hardware
Abstraction hides implementation details between levels
Helps us cope with enormous complexity
ISA: A Critical Interface
o It is critical since the organization and implementation of the
computer system will depend on the proposed instruction set.
o It is also critical since different types of software implementations
will tend to run efficiently on the architecture (hardware).
software
instruction set
hardware
Key Consideration in Comp Arch
Application
Operating
System
Compiler
Firmware
Instr. Set Proc. I/O system
Software
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
Hardware
Layout
o
Coordination of many levels of abstraction
Under a rapidly changing set of forces
Design, Measurement, and Evaluation
16
Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Operating
Systems
History
17
Understanding Performance
Algorithm
Programming language, compiler, architecture
Determine number of machine instructions executed per
operation
C/C++ (Compiled Language) vs. Python, Basic (Interpreted)
The design of the Processor and memory system
Determines number of operations executed
Bubble Sort O(n2) vs. Quick Sort O(nlogn)
Determine how fast instructions are executed
16-bit vs. 32-bit, 1GHz vs. 3 GHz, Cache Memory …
I/O system (including OS)
Determines how fast I/O operations are executed
Classes of Computers
Supercomputers:
Server computers
Network based
High capacity, performance, reliability
Range from small servers to building sized
Personal computers
High-end scientific and engineering calculations
Highest capability but represent a small fraction of the overall
computer market
General purpose, variety of software
Subject to cost/performance tradeoff
Embedded Computers:
Hidden as components of systems
Stringent power/performance/cost constraints
Why Study Computer Architecture?
You want to be called “Computer Engineer or Scientist”
You want to become an “expert” on computer hardware
You want to become a “computer system designer”
You want to become a “software designer” and need to
understand how to improve code performance
Technology is improving rapidly new opportunities
Has never been more exciting!
Impacts Electrical Engineering and Computer Science
Machine Learning and AI would not be possible!
What You will Learn?
Instruction Set Design.
How programs are translated into the machine language
The organization and implementation of the ISA.
The hardware/software interface (compilers, assemblers, .)
What determines program performance
And how it can be improved
How hardware designers improve performance
And how the hardware executes them
Pipelining
Memory Hierarchy (Cache Design)
Superscalar Implementations
What is parallel processing
Moore’s Law
Moore’s Law
In 1965, Intel’s Gordon Moore
predicted that the number of
transistors that can be
integrated on single chip would
double about every two years
Dual Core
Itanium with
1.7B transistors
feature size
&
die size
Courtesy, Intel ®
Moore’s Law for CPUs and DRAMs
VLSI Trends: Moore’s Law
In 1965, Gordon Moore predicted that transistors
would continue to shrink, allowing:
Doubled transistor density every 18-24 months
Doubled performance every 18-24 months
History has proven Moore right
But, is the end in sight?
Physical limitations
Economic limitations
Gordon Moore
Intel Co-Founder and Chairman Emeritus
Image source: Intel Corporation www.intel.com
25
History
Technology Trends
o Electronics technology
continues to evolve
● Increased capacity and
performance
● Reduced cost
DRAM capacity
Year
Technology
1951
Vacuum tube
1965
Transistor
1975
Integrated circuit (IC)
1995
Very large scale IC (VLSI)
2013
Ultra large scale IC
Relative performance/cost
1
35
900
2,400,000
250,000,000,000
Computer Generations
Typical Speed
(operations per second)
40,000
Generation
Approximate
Dates
1
1946–1957
Vacuum tube
2
1957–1964
Transistor
3
1965–1971
4
1972–1977
Small and medium scale
integration
Large scale integration
5
1978–1991
Very large scale integration
100,000,000
6
1991-
Ultra large scale integration
>1,000,000,000
Technology
200,000
1,000,000
10,000,000
28
History of Computers
First Generation: Vacuum Tubes
o Vacuum tubes were used for digital logic
elements and memory
o IAS computer
● Fundamental design approach was the stored program
concept
♦ Attributed to the mathematician John von Neumann
♦ First publication of the idea was in 1945 for the EDVAC
● Design began at the Princeton Institute for Advanced
Studies
● Completed in 1952
● Prototype of all subsequent general-purpose computers
29
Instruction Type
Opcode
00001010
Symbolic
Representation
LOAD MQ
00001001
LOAD MQ,M(X)
00100001
STOR M(X)
00000001
00000010
00000011
LOAD M(X)
LOAD –M(X)
LOAD |M(X)|
00000100
00001101
00001110
00001111
LOAD –|M(X)|
JUMP M(X,0:19)
JUMP M(X,20:39)
JUMP+ M(X,0:19)
00000101
00000111
00000110
00001000
0
0
0
1
0
0
0
0
ADD M(X)
ADD |M(X)|
SUB M(X)
SUB |M(X)|
00001011
MUL M(X)
00001100
DIV M(X)
00010100
LSH
00010101
RSH
00010010
STOR M(X,8:19)
00010011
STOR M(X,28:39)
Data transfer
Unconditional
branch
Conditional branch
JU
MP
+
M(X
,20:
39)
Arithmetic
Address modify
Description
Transfer contents of register MQ to the
accumulator AC
Transfer contents of memory location X to
MQ
Transfer contents of accumulator to memory
location X
Transfer M(X) to the accumulator
Transfer –M(X) to the accumulator
Transfer absolute value of M(X) to the
accumulator
Transfer –|M(X)| to the accumulator
Take next instruction from left half of M(X)
Take next instruction from right half of M(X)
If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
If number in the
accumulator is nonnegative,
take next instruction from
right half of M(X)
The IAS
Instruction Set
Add M(X) to AC; put the result in AC
Add |M(X)| to AC; put the result in AC
Subtract M(X) from AC; put the result in AC
Subtract |M(X)| from AC; put the remainder
in AC
Multiply M(X) by MQ; put most significant
bits of result in AC, put least significant bits
in MQ
Divide AC by M(X); put the quotient in MQ
and the remainder in AC
Multiply accumulator by 2; i.e., shift left one
bit position
Divide accumulator by 2; i.e., shift right one
position
Replace left address field at M(X) by 12
rightmost bits of AC
Replace right address field at M(X) by 12
rightmost bits of AC
30
Instruction Type
Opcode
00001010
Symbolic
Representation
LOAD MQ
00001001
LOAD MQ,M(X)
00100001
STOR M(X)
00000001
00000010
00000011
LOAD M(X)
LOAD –M(X)
LOAD |M(X)|
00000100
00001101
00001110
00001111
LOAD –|M(X)|
JUMP M(X,0:19)
JUMP M(X,20:39)
JUMP+ M(X,0:19)
00000101
00000111
00000110
00001000
0
0
0
1
0
0
0
0
ADD M(X)
ADD |M(X)|
SUB M(X)
SUB |M(X)|
00001011
MUL M(X)
00001100
DIV M(X)
00010100
LSH
00010101
RSH
00010010
STOR M(X,8:19)
00010011
STOR M(X,28:39)
Data transfer
Unconditional
branch
Conditional branch
JU
MP
+
M(X
,20:
39)
Arithmetic
Address modify
Description
Transfer contents of register MQ to the
accumulator AC
Transfer contents of memory location X to
MQ
Transfer contents of accumulator to memory
location X
Transfer M(X) to the accumulator
Transfer –M(X) to the accumulator
Transfer absolute value of M(X) to the
accumulator
Transfer –|M(X)| to the accumulator
Take next instruction from left half of M(X)
Take next instruction from right half of M(X)
If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
If number in the
accumulator is nonnegative,
take next instruction from
right half of M(X)
The IAS
Instruction Set
Add M(X) to AC; put the result in AC
Add |M(X)| to AC; put the result in AC
Subtract M(X) from AC; put the result in AC
Subtract |M(X)| from AC; put the remainder
in AC
Multiply M(X) by MQ; put most significant
bits of result in AC, put least significant bits
in MQ
Divide AC by M(X); put the quotient in MQ
and the remainder in AC
Multiply accumulator by 2; i.e., shift left one
bit position
Divide accumulator by 2; i.e., shift right one
position
Replace left address field at M(X) by 12
rightmost bits of AC
Replace right address field at M(X) by 12
rightmost bits of AC
31
Instruction Type
Opcode
00001010
Symbolic
Representation
LOAD MQ
00001001
LOAD MQ,M(X)
00100001
STOR M(X)
00000001
00000010
00000011
LOAD M(X)
LOAD –M(X)
LOAD |M(X)|
00000100
00001101
00001110
00001111
LOAD –|M(X)|
JUMP M(X,0:19)
JUMP M(X,20:39)
JUMP+ M(X,0:19)
00000101
00000111
00000110
00001000
0
0
0
1
0
0
0
0
ADD M(X)
ADD |M(X)|
SUB M(X)
SUB |M(X)|
00001011
MUL M(X)
00001100
DIV M(X)
00010100
LSH
00010101
RSH
00010010
STOR M(X,8:19)
00010011
STOR M(X,28:39)
Data transfer
Unconditional
branch
Conditional branch
JU
MP
+
M(X
,20:
39)
Arithmetic
Address modify
Description
Transfer contents of register MQ to the
accumulator AC
Transfer contents of memory location X to
MQ
Transfer contents of accumulator to memory
location X
Transfer M(X) to the accumulator
Transfer –M(X) to the accumulator
Transfer absolute value of M(X) to the
accumulator
Transfer –|M(X)| to the accumulator
Take next instruction from left half of M(X)
Take next instruction from right half of M(X)
If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
If number in the
accumulator is nonnegative,
take next instruction from
right half of M(X)
The IAS
Instruction Set
Add M(X) to AC; put the result in AC
Add |M(X)| to AC; put the result in AC
Subtract M(X) from AC; put the result in AC
Subtract |M(X)| from AC; put the remainder
in AC
Multiply M(X) by MQ; put most significant
bits of result in AC, put least significant bits
in MQ
Divide AC by M(X); put the quotient in MQ
and the remainder in AC
Multiply accumulator by 2; i.e., shift left one
bit position
Divide accumulator by 2; i.e., shift right one
position
Replace left address field at M(X) by 12
rightmost bits of AC
Replace right address field at M(X) by 12
rightmost bits of AC
32
Instruction Type
Opcode
00001010
Symbolic
Representation
LOAD MQ
00001001
LOAD MQ,M(X)
00100001
STOR M(X)
00000001
00000010
00000011
LOAD M(X)
LOAD –M(X)
LOAD |M(X)|
00000100
00001101
00001110
00001111
LOAD –|M(X)|
JUMP M(X,0:19)
JUMP M(X,20:39)
JUMP+ M(X,0:19)
00000101
00000111
00000110
00001000
0
0
0
1
0
0
0
0
ADD M(X)
ADD |M(X)|
SUB M(X)
SUB |M(X)|
00001011
MUL M(X)
00001100
DIV M(X)
00010100
LSH
00010101
RSH
00010010
STOR M(X,8:19)
00010011
STOR M(X,28:39)
Data transfer
Unconditional
branch
Conditional branch
JU
MP
+
M(X
,20:
39)
Arithmetic
Address modify
Description
Transfer contents of register MQ to the
accumulator AC
Transfer contents of memory location X to
MQ
Transfer contents of accumulator to memory
location X
Transfer M(X) to the accumulator
Transfer –M(X) to the accumulator
Transfer absolute value of M(X) to the
accumulator
Transfer –|M(X)| to the accumulator
Take next instruction from left half of M(X)
Take next instruction from right half of M(X)
If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
If number in the
accumulator is nonnegative,
take next instruction from
right half of M(X)
The IAS
Instruction Set
Add M(X) to AC; put the result in AC
Add |M(X)| to AC; put the result in AC
Subtract M(X) from AC; put the result in AC
Subtract |M(X)| from AC; put the remainder
in AC
Multiply M(X) by MQ; put most significant
bits of result in AC, put least significant bits
in MQ
Divide AC by M(X); put the quotient in MQ
and the remainder in AC
Multiply accumulator by 2; i.e., shift left one
bit position
Divide accumulator by 2; i.e., shift right one
position
Replace left address field at M(X) by 12
rightmost bits of AC
Replace right address field at M(X) by 12
rightmost bits of AC
33
Instruction Type
Opcode
00001010
Symbolic
Representation
LOAD MQ
00001001
LOAD MQ,M(X)
00100001
STOR M(X)
00000001
00000010
00000011
LOAD M(X)
LOAD –M(X)
LOAD |M(X)|
00000100
00001101
00001110
00001111
LOAD –|M(X)|
JUMP M(X,0:19)
JUMP M(X,20:39)
JUMP+ M(X,0:19)
00000101
00000111
00000110
00001000
0
0
0
1
0
0
0
0
ADD M(X)
ADD |M(X)|
SUB M(X)
SUB |M(X)|
00001011
MUL M(X)
00001100
DIV M(X)
00010100
LSH
00010101
RSH
00010010
STOR M(X,8:19)
00010011
STOR M(X,28:39)
Data transfer
Unconditional
branch
Conditional branch
JU
MP
+
M(X
,20:
39)
Arithmetic
Address modify
Description
Transfer contents of register MQ to the
accumulator AC
Transfer contents of memory location X to
MQ
Transfer contents of accumulator to memory
location X
Transfer M(X) to the accumulator
Transfer –M(X) to the accumulator
Transfer absolute value of M(X) to the
accumulator
Transfer –|M(X)| to the accumulator
Take next instruction from left half of M(X)
Take next instruction from right half of M(X)
If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
If number in the
accumulator is nonnegative,
take next instruction from
right half of M(X)
The IAS
Instruction Set
Add M(X) to AC; put the result in AC
Add |M(X)| to AC; put the result in AC
Subtract M(X) from AC; put the result in AC
Subtract |M(X)| from AC; put the remainder
in AC
Multiply M(X) by MQ; put most significant
bits of result in AC, put least significant bits
in MQ
Divide AC by M(X); put the quotient in MQ
and the remainder in AC
Multiply accumulator by 2; i.e., shift left one
bit position
Divide accumulator by 2; i.e., shift right one
position
Replace left address field at M(X) by 12
rightmost bits of AC
Replace right address field at M(X) by 12
rightmost bits of AC
34
Central processing unit (CPU)
Arithmetic-logic unit (CA)
AC
MQ
Inputoutput
equipment
(I, O)
Arithmetic-logic
circuits
MBR
First Generation:
Vacuum Tubes
Instructions
and data
Instructions
and data
M(0)
M(1)
M(2)
M(3)
M(4)
PC
IBR
MAR
IR
Main
memory
(M)
Control
signals
M(4092)
M(4093)
M(4095)
AC: Accumulator register
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
PC: program counter
MAR: memory address register
IR: insruction register
Control
circuits
Program control unit (CC)
Addresses
Figure 1.6 IAS Structure
35
Central processing unit (CPU)
Arithmetic-logic unit (CA)
AC
MQ
Inputoutput
equipment
(I, O)
Arithmetic-logic
circuits
MBR
First Generation:
Vacuum Tubes
Instructions
and data
Instructions
and data
M(0)
M(1)
M(2)
M(3)
M(4)
PC
IBR
MAR
IR
Main
memory
(M)
Control
signals
M(4092)
M(4093)
M(4095)
AC: Accumulator register
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
PC: program counter
MAR: memory address register
IR: insruction register
Control
circuits
Program control unit (CC)
Addresses
Figure 1.6 IAS Structure
36
Central processing unit (CPU)
Arithmetic-logic unit (CA)
AC
MQ
Inputoutput
equipment
(I, O)
Arithmetic-logic
circuits
MBR
First Generation:
Vacuum Tubes
Instructions
and data
Instructions
and data
M(0)
M(1)
M(2)
M(3)
M(4)
PC
IBR
MAR
IR
Main
memory
(M)
Control
signals
M(4092)
M(4093)
M(4095)
AC: Accumulator register
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
PC: program counter
MAR: memory address register
IR: insruction register
Control
circuits
Program control unit (CC)
Addresses
Figure 1.6 IAS Structure
37
Central processing unit (CPU)
Arithmetic-logic unit (CA)
AC
MQ
Inputoutput
equipment
(I, O)
Arithmetic-logic
circuits
MBR
First Generation:
Vacuum Tubes
Instructions
and data
Instructions
and data
M(0)
M(1)
M(2)
M(3)
M(4)
PC
IBR
MAR
IR
Main
memory
(M)
Control
signals
M(4092)
M(4093)
M(4095)
AC: Accumulator register
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
PC: program counter
MAR: memory address register
IR: insruction register
Control
circuits
Program control unit (CC)
Addresses
Figure 1.6 IAS Structure
38
Registers
Memory buffer register
(MBR)
• Contains a word to be stored in memory or sent to the I/O unit
• Or is used to receive a word from memory or from the I/O unit
Memory address register
(MAR)
• Specifies the address in memory of the word to be written from
or read into the MBR
Instruction register (IR)
• Contains the 8-bit opcode instruction being executed
Instruction buffer
register (IBR)
Program counter (PC)
Accumulator (AC) and
multiplier quotient (MQ)
• Employed to temporarily hold the right-hand instruction from a
word in memory
• Contains the address of the next instruction pair to be fetched
from memory
• Employed to temporarily hold operands and results of ALU
operations
39
History of Computers
Second Generation: Transistors
● Smaller
● Cheaper
● Dissipates less heat than a vacuum tube
● Is a solid state device made from silicon
● Was invented at Bell Labs in 1947
● It was not until the late 1950’s that fully transistorized computers
were commercially available
40
Second Generation Computers
o Introduced:
● More complex arithmetic and logic units and control units
● The use of high-level programming languages
● Provision of system software which provided the ability to:
♦ Load programs
♦ Move data to peripherals
♦ Libraries perform common computations
41
IBM 7094 computer
Peripheral devices
Mag tape
units
CPU
Data
channel
Second
Generation
Computers
Card
punch
Line
printer
Card
reader
Drum
Multiplexor
Data
channel
Data
channel
Disk
Disk
Hypertapes
Memory
Data
channel
Teleprocessing
equipment
Figure 1.9 An IBM 7094 Configuration
42
IBM 7094 computer
Peripheral devices
Mag tape
units
CPU
Data
channel
Second
Generation
Computers
Card
punch
Line
printer
Card
reader
Drum
Multiplexor
Data
channel
Data
channel
Disk
Disk
Hypertapes
Memory
Data
channel
Teleprocessing
equipment
Figure 1.9 An IBM 7094 Configuration
43
History of Computers
Third Generation: Integrated Circuits
o 1958 – the invention of the integrated circuit
o Discrete component
● Single, self-contained transistor
● Manufactured separately, packaged in their own
containers, and soldered or wired together onto
masonite-like circuit boards
● Manufacturing process was expensive and
cumbersome
o The two most important members of the third generation were
the IBM System/360 and the DEC PDP-8
44
IBM System/360
o Announced in 1964
o Product line was incompatible with older IBM machines
o Was the success of the decade and cemented IBM as the
overwhelmingly dominant computer vendor
o The architecture remains to this day the architecture of IBM’s
mainframe computers
o Was the industry’s first planned family of computers
● Models were compatible in the sense that a program
written for one model should be capable of being executed
by another model in the series
45
Later
Generations
LSI
Large
Scale
Integration
VLSI
Very Large
Scale
Integration
ULSI
Semiconductor Memory
Microprocessors
Ultra Large
Scale
Integration
512KB L2
512KB L2
Core 1
Core 2
Four out-oforder cores
on one chip
1.9 GHz
clock rate
65nm
technology
Three levels
of caches
(L1, L2, L3)
on chip
Integrated
Northbridge
Core 3
512KB L2
Northbridge
512KB L2
2MB shared L3 Cache
AMD’s Barcelona Multicore Chip
Core 4
http://www.techwarelabs.com/reviews/processors/barcelona/
Processors: Evolution
CPU Transistor Count (1971 – 2008)
10-Core Xeon Westmere-EX
introduced in 2011 has 2.6 billion
transistors and uses a 32 nm
process on a die size = 512 mm2
Microprocessors
o The density of elements on processor chips continued to rise
● More and more elements were placed on each chip so that fewer and
fewer chips were needed to construct a single computer processor
o 1971 Intel developed 4004
● First chip to contain all of the components of a CPU on a single chip
● Birth of microprocessor
o 1972 Intel developed 8008
● First 8-bit microprocessor
o 1974 Intel developed 8080
● First general purpose microprocessor
● Faster, has a richer instruction set, has a large addressing capability
50
Intel Microprocessors: 1970’s
4004
1971
8008
1972
8080
1974
Clock speeds
108 kHz
108 kHz
2 MHz
Bus width
Number of
transistors
Feature size
(µm)
Addressable
memory
4 bits
8 bits
8 bits
8086
1978
5 MHz, 8 MHz, 10
MHz
16 bits
2,300
3,500
6,000
29,000
29,000
10
8
6
3
6
640 Bytes
16 KB
64 KB
1 MB
1 MB
Introduced
8088
1979
5 MHz, 8 MHz
8 bits
51
Intel Microprocessors: 1980’s
Introduced
Clock speeds
Bus width
Number of transistors
Feature size (µm)
Addressable
memory
Virtual
memory
Cache
80286
386TM DX
386TM SX
1982
6 MHz - 12.5
MHz
16 bits
1985
16 MHz - 33
MHz
32 bits
1988
16 MHz - 33
MHz
16 bits
486TM DX
CPU
1989
25 MHz - 50
MHz
32 bits
134,000
275,000
275,000
1.2 million
1.5
1
1
0.8 - 1
16 MB
4 GB
16 MB
4 GB
1 GB
64 TB
64 TB
64 TB
—
—
—
8 kB
52
Highlights of the Evolution of the Intel Product Line:
8080
8086
80286
• World’s first
general-purpose
microprocessor
• 8-bit machine,
8-bit data path to
memory
• Was used in the
first personal
computer
(Altair)
• A more powerful
16-bit machine
• Has an
instruction
cache, or queue,
that prefetches a
few instructions
before they are
executed
• The first
appearance of
the x86
architecture
• The 8088 was a
variant of this
processor and
used in IBM’s
first personal
computer
(securing the
success of Intel
• Extension of the
8086 enabling
addressing a 16MB memory
instead of just
1MB
80386
• Intel’s first 32bit machine
• First Intel
processor to
support
multitasking
80486
• Introduced the
use of much
more
sophisticated
and and
sophisticated
instruction
powerful cache
technology
pipelining
• Also offered a
built-in math
coprocessor
53
Intel Microprocessors: 1990’s
Introduced
Clock speeds
Bus width
Number of
transistors
Feature size (µm)
Addressable
memory
Virtual memory
Cache
486TM SX
1991
16 MHz - 33
MHz
32 bits
Pentium
1993
60 MHz - 166
MHz,
32 bits
Pentium Pro
1995
150 MHz - 200
MHz
64 bits
Pentium II
1997
200 MHz - 300
MHz
64 bits
1.185 million
3.1 million
5.5 million
7.5 million
1
0.8
0.6
0.35
4 GB
4 GB
64 GB
64 GB
64 TB
64 TB
64 TB
8 kB
8 kB
64 TB
512 kB L1 and 1
MB L2
512 kB L2
54
Intel Microprocessors: 2000’s
Pentium III
Pentium 4
1999
450 - 660 MHz
2000
1.3 - 1.8 GHz
2006
1.06 - 1.2 GHz
Core i7 EE
4960X
2013
4 GHz
64 bits
64 bits
64 bits
64 bits
Number of
transistors
Feature size (nm)
Addressable
memory
Virtual memory
9.5 million
42 million
167 million
1.86 billion
250
180
65
22
64 GB
64 GB
64 GB
64 GB
64 TB
64 TB
64 TB
Cache
512 kB L2
256 kB L2
2 MB L2
1
1
2
64 TB
1.5 MB L2/15
MB L3
6
Introduced
Clock speeds
Bus
wid
th
Number of cores
Core 2 Duo
55
Highlights of the Evolution of the Intel Product Line:
Pentium
• Intel introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel
Pentium Pro
• Continued the move into superscalar organization with aggressive use of register renaming, branch prediction, data flow
analysis, and speculative execution
Pentium II
• Incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics data efficiently
Pentium III
•Incorporated additional floating-point instructions
•Streaming SIMD Extensions (SSE)
Pentium 4
• Includes additional floating-point and other enhancements for multimedia
Core
• First Intel x86 micro-core
Core 2
•
•
•
•
Extends the Core architecture to 64 bits
Core 2 Quad provides four cores on a single chip
More recent Core offerings have up to 10 cores per chip
An important addition to the architecture was the Advanced Vector Extensions instruction set
56
Inside a Multicore Processor Chip
AMD Barcelona: 4 Processor Cores
3 Levels of Caches
The Evolution of the Intel x86 Architecture
o Two processor families are the Intel x86 and the ARM
architectures
o Current x86 offerings represent the results of decades of design
effort on Complex Instruction Set Computers (CISCs)
o An alternative approach to processor design is the Reduced
Instruction Set Computer (RISC)
o ARM architecture is used in a wide variety of embedded systems
and is one of the most powerful and best-designed RISC-based
systems on the market
58
Hardware & Software
Five Classic Components
Since the 1940’s, computers have 5 classic components
Computer
Input devices
Devices
Processor
Keyboard, mouse, …
Output devices
Display, printer, …
Input
Control
Memory
Datapath
Output
Storage devices
Volatile memory devices: DRAM, SRAM, …
Permanent storage devices: Magnetic, Optical, and Flash disks, …
Datapath
Together, they are called the Processor
Control
Newly added 6th component: Network
Essential component for communication in any computer system
COMPUTER
Main
memory
I/O
System
Bus
CPU
CPU
Registers
Structure
ALU
Internal
Bus
Control
Unit
CONTROL
UNIT
Sequencing
Logic
Control Unit
Registers and
Decoders
Control
Memory
Figure 1.1 A Top-Down View of a Computer
61
Inside the Processor (CPU)
Datapath:
Control:
Arithmetic Logic Unit (ALU) ..performs operations on data
sequences datapath, sends signals to memory, ...
Cache memory
Small fast SRAM memory for immediate access to data
Apple A5
Cache Memory
o Multiple layers of memory between the processor and
main memory
o Is smaller and faster than main memory
o Used to speed up memory access by placing in the
cache data from main memory that is likely to be used
in the near future
o A greater performance improvement may be obtained
by using multiple levels of cache, with level 1 (L1)
closest to the core and additional levels (L2, L3, etc.)
progressively farther from the core
63
Below Your Program
Application software
System software
Written in high-level language
Compiler: translates HLL code to machine code
Operating System: service code
Handling input/output
Managing memory and storage
Scheduling tasks & sharing resources
Hardware
Processor, memory, I/O controllers
Levels of Program Code
High-level language
Level of abstraction closer to
problem domain
Provides for productivity and
portability
Levels of Program Code
Assembly language
Textual representation of
instructions
Levels of Program Code
Hardware representation
Binary digits (bits)
Encoded instructions and data
Levels of Representation
temp = v[k];
High Level Language
Program
v[k] = v[k+1];
v[k+1] = temp;
Compiler
lw
lw
sw
sw
Assembly Language
Program
$15,
$16,
$16,
$15,
0($2)
4($2)
0($2)
4($2)
Assembler
Machine Language
Program
0000
1010
1100
0101
1001
1111
0110
1000
1100
0101
1010
0000
0110
1000
1111
1001
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine Interpretation
Control Signal
Specification
°
°
ALUOP[0:3] <= InstReg[9:11] & MASK
[i.e.high/low on control lines]
Advantages of HLLs
• Higher-level languages (HLLs)
Allow the programmer to think in a more natural language and
for their intended use (Fortran for scientific computation,
Cobol for business programming, Lisp for symbol
manipulation, Java for web programming, …)
Improve programmer productivity – more understandable
code that is easier to debug and validate
Improve program maintainability
Allow programs to be independent of the computer on which
they are developed (compilers and assemblers can translate
high-level language programs to the binary instructions of any
machine) … Portability
Emergence of optimizing compilers that produce very efficient
assembly code optimized for the target machine
• Compilers convert source code to object code
• Libraries simplify common tasks
Software Tools
MIPS Simulators
MARS: MIPS Assembly and Runtime Simulator
Runs MIPS-32 assembly language programs
Website: http://courses.missouristate.edu/KenVollmar/MARS/
CPU Design and Simulation Tool
Logisim
Educational tool for designing and simulating CPUs
Website: http://www.cburch.com/logisim/
Von Neumann
Architecture
The Von Neumann Computer
Principle
In 1945, the mathematician Von Neumann (VN)
demonstrated in study of computation that a
computer could have a simple structure,
capable of executing any kind of program,
given a properly programmed control unit,
without the need of hardware modification
ENIAC - The first electronic
computer (1946)
72
The Von Neumann Computer
Structure
A memory for storing program and data.
The memory consists of the word with the same length
A control unit (control path) featuring a program counter for
controlling program execution
An arithmetic and logic unit (ALU) also called data path for
program execution
Processor or
Central processing unit
Memory
Datapath
Data
Data
and
Instructions
Registers
Instruction
register
PC
Address
register
Address
Controllpath
73
The Von Neumann Computer
Coding
A program is coded as a set of instructions to be
sequentially executed
Program execution
Instruction Fetch (IF): The next instruction to be
executed is fetched from the memory
Decode (D): Instruction is decoded (operation?)
Read operand (R): Operands read from the memory
Execute (EX): Operation is executed on the ALU
Write result (W): Results written back to the memory
Instruction execution in Cycle (IF, D, R, EX, W)
74
Infinite Cycle implemented in Hardware
Fetch - Execute Cycle
Instruction Fetch
Instruction Decode
Execute
Fetch instruction
Compute address of next instruction
Generate control signals for instruction
Read operands from registers
Compute result value
Memory Access
Read or write memory (load/store)
Writeback Result
Writeback result in a register
Fetch/Execute
Operation of CPU
Clear Accumulator A
Load Accumulator A
CLRA
LDAA #$5C
E000: 4F
E001: 86 5C
CPU operations
Fetching CLRA
AR: address reg.
76
Fetch/Execute
Operation of CPU
CPU operation
Executing the first
instruction CLRA
77
Fetch/Execute
Operation of CPU
CPU Operation
Fetching the second
instruction opcode LDAA#
78
Fetch/Execute
Operation of CPU
CPU operation
Fetching the second
instruction operand ($5C)
and executing the instruction
Note that PC increments
after each clock cycle
79
Bottlenecks in VN Architecture
80
The Von Neumann Computer
Advantage:
Simplicity.
Flexibility: any well coded program can be executed
Drawbacks:
Speed efficiency: Not efficient, due to the sequential
program execution (temporal resource sharing).
Resource efficiency: Only one part of the
hardware resources is required for the execution of
an instruction. The rest remains idle.
Memory access: Memories are about 5 times
slower than the processor
How to compensate for deficiencies?
81
Eight Great Ideas
Design for Moore’s Law
Use abstraction to simplify design
Make the common case fast
Performance via parallelism
Performance via pipelining
Performance via prediction
Hierarchy of memories
Dependability via redundancy
Improving Performance of VN (GPPs)
1. Technology Scaling
Improve performance (increase clock frequency!)
2. Improving Instruction Set of Processor
3. Application Specific Processors (DSP)
4. Use of Hierarchical Memory System
Cache can enhance speed
5. Multiplicity of Functional Units (H/W)
Adders/Multipliers/Dividers (CDC-6600)
6. Pipelining within CPU (H/W)
A four stage pipeline stage (IF/ID/OF/EX)
7. Overlap CPU & I/O Operations (H/W)
DMA (Direct Memory Access) can be used to enhance performance
8. Time Sharing (SW)
Multi-tasking assigns fixed or variable time slices to multiple programs
9. Parallelism & Multithreading (S/W) (H/W)
Compilers/Multi-core systems
83
Summary
o
o
o
o
o
Computer Architecture includes the design of the
Instruction Set Architecture (programmer’s view) and the
Machine Organization (logic designer’s view).
Levels of abstraction, which consist of an interface and an
implementation are useful to manage designs.
Computer systems are comprised on datapath, memory,
input devices, output devices, and control.
Processor performance increases rapidly, but the speeds
of memory and I/0 have not kept pace.
Several techniques are employed to enhance the
performance of the computer system including pipelining,
memory hierarchy, multiprocessor systems, …
84