Transcript Lecture 1

CS 325: CS Hardware and Software
Organization and Architecture
Introduction
1
Why Should We Study Computer Architecture?
• It’s required 
• Understand computer performance and cost factors.
• Basis for understanding of OS and programming concepts.
• Understand how to write programs that are:
• Faster
• Smaller
• Less prone to error
• To appreciate the relative cost of operations and the effect
of programming choices.
• Helps you to debug.
2
The Bad News…
•Digital Hardware
• Is complex
• Cannot be fully understood in one course
• Requires background in electrical engineering, physics,
chemistry
•The CPU is the most complex device created by
humans.
• Over 10 Billion transistors (2016)
• Transistor switching speed of over 4 billion/sec (4Ghz)
• 14nm fabrication process, and getting smaller
• 10nm scale projected for 2017-2018
• ~43 Si atoms!
3
The Good News!
•It is possible to understand the architectural
components without knowing all of the low-level
details.
•Programmers only need to know the essentials
• Characteristics of major components
• Role in overall system
• Consequences for programmers
4
Programming efficiently
•Relies on the software developer to understand the
underlying architecture.
• Software optimization for completing the same task using fewer
resources
• Discovering and reducing (if possible) the impact of bottlenecks
• I/O latency
• Network bandwidth
• Data locality
• Reducing fetches to main and secondary memory
• Pre-emptive fetch
• Taking advantage, when possible, of multi-core CPU architecture
5
Why Study Computer Architecture?
• Understand where computers are going
• Future capabilities drive the (computing) world
• Real world-impact: no computer architecture  no computers!
• Understand high-level design concepts
• The best architects understand all the levels
• Devices, circuits, architecture, compiler, applications
• Write better software
• The best software designers also understand hardware
• Need to understand hardware to write fast software
• Design hardware
• Intel, AMD, IBM, ARM, Qualcomm, NVIDIA, Samsung
6
Course Goals
•See the big ideas in computer architecture
• Number systems, digital logic, pipelining, parallelism, caching,
abstraction, …
• Exposure to examples of good (and some bad) engineering
•Get exposure to research and cutting edge
ideas
• Read some research papers
• Supplemental reading
7
Computer Architecture Design Goals &
Constraints
•Functional
• Needs to be correct
• And unlike software, difficult to update once deployed
• What functions should it support?
• Adder module
• Multiply module?
•Reliable
• Does it continue to perform correctly?
• Persistent fault vs transient fault
• Satellites vs desktop vs server
•High performance
• Not just “Gigahertz” – 2.6 GHz ARM vs. 2.6 GHz Intel Xeon
• Impossible goal: fastest possible design for all programs
8
Design Goals & Constraints
•Low cost
• Per unit manufacturing cost (wafer cost)
• Cost of making first chip after design (mask cost)
• Design cost
•Low power/energy
• Energy in (battery life, cost of electricity)
• Energy out (cooling and related costs)
•High performance
• Increased efficiency and/or throughput
•Challenge: balancing the relative importance of
these goals
• And the balance is constantly changing
• Our focus: performance, only touch on cost, power, reliability
9
Shaping Force: Applications/Domains
•Scientific: weather prediction, genome sequencing
• First computing application domain: naval ballistics firing
tables
• Need: large memory, heavy-duty floating point
• Examples: CRAY T3E, IBM BlueGene, Intel Xeon Phi, GPUs
•Commercial: database/web serving, e-commerce,
Google, Amazon
• Need: data movement, high memory + I/O bandwidth
• Examples: Sun Enterprise Server, AMD Opteron, Intel
Xeon
10
More Recent Applications/Domains
• Desktop: home office, multimedia, games
• Need: Increasing memory bandwidth, computational performance,
integrated graphics/network?
• Examples: Intel Core i*, AMD Athlon
• Mobile: laptops, tablets, phones
• Need: low power, computational performance, integrated wireless
• Laptops: Intel Core i*, Atom, AMD APUs
• Smaller devices: ARM chips by Samsung, Qualcomm, Apple
• Embedded: microcontrollers in automobiles, door knobs,
robotics
• Need: low power, low cost
• Examples: ARM chips, dedicated digital signal processors (DSPs)
• 15 billion ARM chips sold in 2016
• Deeply Embedded: disposable “smart dust” sensors
• Need: extremely low power, extremely low cost
11
Application Specific Designs
• This class is about general-purpose CPUs (specifically x86)
• Processor that can do anything, run a full OS, etc.
• E.g., Intel Core i5, AMD Athlon, IBM PowerPC, ARM, Intel Xeon
• Cyrix
• In contrast to application-specific chips
• Examples: Video encoding, 3D graphics
• General rules
- Hardware is less flexible than software
+Hardware more effective (speed, power, cost) than software
+Domain specific more “parallel” than general purpose
• But general mainstream processors becoming more parallel
12
Technology Trends
• Moore’s Law
• Continued (so far) transistor miniaturization
• Number of transistors in an integrated circuit has doubled approximately every 18 - 24
months
• Some technology-based ramifications
• Annual improvements in density, speed, power, costs
• SRAM/logic: density: ~30%, speed: ~20%
• DRAM: density: ~60%, speed: ~4%
• Disk: density: ~60%, speed: ~10% (non-transistor)
• Big improvements in flash memory and network bandwidth as well
• Changing quickly and with respect to each other!!
• Example: density increases faster than speed
• Re-evaluate/re-design for each technology generation
13
Revolution I: The Microprocessor
• Microprocessor revolution
• One significant technology threshold was crossed in 1970s
• Enough transistors (~25K) to put a 16-bit processor on one chip
• Huge performance advantages: fewer slow chip-crossings
• Microprocessors have allowed new market segments
• Desktops, CD/DVD players, laptops, game consoles, set-top boxes,
mobile phones, digital camera, mp3 players, GPS, automotive
• And replaced incumbents in existing segments
• Microprocessor-based system replaced supercomputers,
“mainframes”, “minicomputers”, etc.
14
First Microprocessor
• Intel 4004 (1971)
• Application: calculators
• Technology: 10000 nm
• 2300 transistors
• 13 mm2
• 108 KHz
• 12 Volts
• 4-bit data
15
Pinnacle of Single-Core Microprocessors
• Intel Pentium4 (2003)
• Application: desktop/server
• Technology: 90nm (1/100th of 4004)
•
•
•
•
55M transistors (20,000x)
101 mm2 (10x)
3.4 GHz (10,000x)
1.2 Volts (1/10th)
•
•
•
•
32/64-bit data (16x)
22-stage pipelined datapath
Two levels of on-chip cache
hyperthreading
16
Modern Multicore Processor
• Intel Core i* (2017)
• Application: desktop/server
• Technology: 14nm (15% of P4)
•
•
•
•
1.4B transistors
177 mm2
3.4 GHz to 4.5 Ghz
1.8 Volts
•
•
•
•
•
•
up to 256-bit data processing
16-stage pipelined datapath
4 instructions per cycle
Three levels of on-chip cache
hyperthreading
multicore (2 - 10)
17
Tracing the Microprocessor Revolution
•How were growing transistor counts used?
•Initially to widen the datapath
• 4004: 4 bits  Pentium4: 64 bits
•… and also to add more powerful instructions
• To reduce overhead of fetch and decode
• To simplify assembly programming (which was done by
hand then)
add si, strlen1
add si, -2
L1:
mov al, [si]
mov [di], al
dec si
inc di
loop L1
18
IBM System: 370
Architecture
• IBM System/370 architecture
• Was introduced in 1970
• Included a number of models
• Could upgrade to a more expensive, faster model
without having to abandon original software
• New models are introduced with improved
technology, but retain the same architecture so
that the customer’s software investment is
protected
• Architecture has survived to this day as the
architecture of IBM’s mainframe product line
• 370 Architecture still in use by IBM today!
• IBM zSeries
• IBM zEnterprise Series
19
Intel x86
Backward compatible instruction set
architectures
• 1978 – Introduced 8086, 16 – bit
• 80186, 80286
• 1985 – 80386, 32 – bit
• 80486, Pentium, Pentium MMX
• 1995 – Pentium Pro
• Pentium II, Pentium III
• 2000 – Pentium 4
• 2006 – Core 2, 64 – bit, multi-core
• 2008 – Core i3/i5/i7
• Atom
20
Computer Functions
• A computer can perform four
basic functions:
●
●
●
●
Data processing
Data storage
Data movement
Control
21
Functional units of a computer
Arithmetic and logic unit(ALU):
• Performs the desired operations on
the input information as determined
by instructions in the memory
Input unit accepts
information:
•Human operators,
•Electromechanical devices
•Other computers
Memory
Input
Instr1
Instr2
Instr3
Data1
Data2
Output
Control
Processor
I/O
Output unit sends
results of processing:
• To a monitor display
• To a printer
Arithmetic
& Logic
Stores
information:
• Instructions
• Data
Control unit coordinates
various actions
• Input
• Output
• Processing
22
Information in a computer -- Instructions
• Instructions specify commands to:
• Transfer information within a computer (e.g., from memory to ALU)
• Transfer of information between the computer and I/O devices (e.g., from
keyboard to computer, or computer to printer)
• Perform arithmetic and logic operations (e.g., Add two numbers, Perform a
logical AND).
• A sequence of instructions to perform a task is called a
program, which is stored in the memory.
• Processor fetches instructions that make up a program
from the memory and performs the operations stated in
those instructions.
• What do the instructions operate upon?
23
Information in a computer -- Data
• Data are the “operands” upon which instructions operate.
• Data could be:
• Numbers, encoded characters, instructions
• Data, in a broad sense means any digital information.
• Computers use data that is encoded as a string of binary digits called
bits.
Data Example 1:
 10100001000001100101111000010101
 161.6.94.21
 www.wku.edu
Data Example 2:
0000000111000001
X86 Encode for adding values in two CPU registers (add ecx, eax)
24
Input unit
Binary information must be presented to a computer in a specific format.
This task is performed by the input unit:
•
•
•
•
Interfaces with input devices.
Accepts binary information from the input devices.
Presents this binary information in a format expected by the computer.
Transfers this information to the memory or processor.
Real world
Computer
Memory
• Keyboard
• Audio
• Network
……
Input Unit
Processor
25
Memory unit
• Memory unit(s) stores instructions and data.
• Recall, data is represented as a series of bits.
• To store data, memory unit thus stores bits.
• Processor reads instructions and reads/writes data from/to
the memory during the execution of a program.
• In theory, instructions and data could be fetched one bit at a time.
• In practice, a group of bits is fetched at a time.
• Group of bits stored or retrieved at a time is termed as “word”
• Number of bits in a word is termed as the “word length” of a
computer.
• In order to read/write to and from memory, a processor
should know where to look:
• “Address” is associated with each word location.
26
Memory unit (contd..)
• Processor reads/writes to/from memory based on the
memory address:
• Access any word location in a short and fixed amount of time based
on the address.
• Random Access Memory (RAM) provides fixed access time
independent of the location of the word.
• Access time is known as “Memory Access Time”.
• Memory and processor have to “communicate” with each
other in order to read/write information.
• In order to reduce “communication time”, a small amount of RAM
(known as Cache) is tightly coupled with the processor.
• Modern computers have three to five levels of RAM units
with different speeds and sizes:
• Fastest, smallest known as Cache
• Slowest, largest known as Main memory
27
Memory Hierarchy
28
Memory unit (contd..)
• Primary storage of the computer consists of RAM units.
• Fastest, smallest unit is Cache.
• Slowest, largest unit is Main Memory.
• Primary storage is insufficient to store large amounts of
data and programs.
• Primary storage can be added, but it is expensive.
• Store large amounts of data on secondary storage devices:
• Magnetic disks and tapes, solid state disks
• Optical disks (CD-ROMS).
• Access to the data stored in secondary storage is slower, but take
advantage of the fact that some information may be accessed
infrequently.
• Cost of a memory unit depends on its access time, smaller
access time implies higher cost.
29
Arithmetic logic unit (ALU)
• Operations are executed in the Arithmetic Logic Unit (ALU).
• Arithmetic operations such as addition, subtraction.
• Logic operations such as comparison of numbers.
• In order to execute an instruction, operands need to be
brought into the ALU from the memory.
• Operands are stored in general purpose registers available in the
ALU
• Access times of general purpose registers are faster than the cache.
• Results of the operations are stored back in the memory or
retained in the processor for immediate use.
• Context switch
30
Multi-Core CPU
• Each “core” in a multi-core CPU has the ability to
independently read and execute program instructions
• Instructions such as arithmetic, addressing, or control functions
• Independent functionality allows for parallel processing of
instructions
• Increased performance
• Cores may have shared and non-shared cache memory
• X86:
• L1 cache: not shared
• L2 cache: not shared
• L3 cache: shared among all cores
• L4 cache: Current implementation used for integrated video memory
31
Output unit
•Computers represent information in a specific binary form. Output units:
• Interface with output devices.
• Accept processed results provided by the computer in specific binary form.
• Convert the information in binary form to a form understood by an output device.
Computer
Real world
Memory
Output Unit
• Printer
• Graphics display
• Speakers
……
Processor
32
Control unit
• Operation of a computer can be summarized as:
• Accepts information from the input units (Input unit).
• Stores the information (Memory).
• Processes the information (ALU).
• Provides processed results through the output units (Output unit).
• Operations of Input unit, Memory, ALU and Output unit are
coordinated by Control unit.
• Instructions control “what” operations take place (e.g. data
transfer, processing).
• Control unit generates timing signals which determines
“when” a particular operation takes place.
33
How are the functional units connected?
• For a computer to achieve its operation, the functional units need to
communicate with each other.
• In order to communicate, they need to be connected.
Input
Output
Memory
Processor
Bus
• Functional units may be connected by a group of parallel wires.
• The group of parallel wires is called a bus.
• Each wire in a bus can transfer one bit of information.
• The number of parallel wires in a bus is equal to the word length of
a computer
34
The main structural components of a computer:
CPU – controls the operation of the computer and performs
its data processing functions
Main Memory – stores data/instructions (volatile)
Secondary Memory – stores data/instructions (non-volatile)
System Interconnection – busses that provide
communication among CPU, main memory, and I/O
I/O – moves data between the computer and its external
environment
35
CPU: Major Structural Components
• Control Unit
• Controls the operation of the CPU and hence the
computer
• Arithmetic and Logic Unit (ALU)
• Performs the computer’s data processing function
• Registers
• Provide storage internal to the CPU
• CPU Interconnection
• Mechanisms that provide communication among the
control unit, ALU, and registers
36