现代计算机体系结构 - 天津大学研究生e

Download Report

Transcript 现代计算机体系结构 - 天津大学研究生e

现代计算机体系结构
主讲教师:张钢 教授
天津大学计算机学院
通信邮箱:[email protected]
提交作业邮箱:[email protected]
2013年
1
The Main Contents课程主要内容
• Chapter 1. Fundamentals of Quantitative Design and
Analysis
• Chapter 2. Memory Hierarchy Design
• Chapter 3. Instruction-Level Parallelism and Its
Exploitation
• Chapter 4. Data-Level Parallelism in Vector, SIMD, and
GPU Architectures
• Chapter 5. Thread-Level Parallelism
• Chapter 6. Warehouse-Scale Computers to Exploit
Request-Level and Data-Level Parallelism
• Appendix A. Pipelining: Basic and Intermediate Concepts
2
Main Memory
• Some definitions:
– Bandwidth (bw): Bytes read or written per unit time
– Latency: Described by
• Access Time: Delay between access initiation &
completion
– For reads: Present address till result ready.
• Cycle time: Minimum interval between separate
requests to memory.
– Address lines: Separate bus CPUMem to carry
addresses. (Not usu. counted in BW figures.)
– RAS (Row Access Strobe)
• First half of address, sent first.
– CAS (Column Access Strobe)
• Second half of address, sent second.
3
RAS vs. CAS (save address pin)
DRAM bit-cell array
1. RAS
selects a row
2. Parallel
readout of
all row data
3. CAS selects
a column to read
4. Selected bit
written to memory bus
4
Types of Memory
• SRAM (Static Random Access Memory)
– Cell voltages are statically (unchangingly) tied to power
supply references. No drift, no refresh.
– But needs 4-6 transistors per bit.
• DRAM (Dynamic Random Access Memory)
– Cell design needs only 1 transistor per bit stored.
– Cell charges leak away and may dynamically (over time)
drift from their initial levels.
– Requires periodic refreshing to correct drift
• e.g. every 8 ms
– Time spent refreshing kept to <5% of BW
• DRAM: 4-8x larger, 8-16x slower, 8-16x cheaper/bit
5
Typical DRAM Organization
(256 Mbit)
Low 14 bits
High
14 bits
6
Amdahl/Case Rule
• Memory size (and I/O b/w) should grow linearly
with CPU speed
– Typical: 1 MB main memory, 1 Mbps I/O b/w per 1
MIPS CPU performance.
• Takes a fairly constant ~8 seconds to scan entire memory (if
memory bandwidth = I/O bandwidth, 4 bytes/load, 1 load/4
instructions, and if latency not a problem)
• Moore’s Law:
– DRAM size doubles every 18 months (up 60%/yr)
– Tracks processor speed improvements
• Unfortunately, DRAM latency has only decreased
7%/yr! Latency is a big deal.
7
Memory Technology
Memory Optimizations
8
Memory Technology
Memory Optimizations
9
Memory Optimizations
• SDRAM
– Synchronous DRAM
– Add a clock signal to DRAM interface
– Repeated transfers not bear overhead
• DDR
– Double Data Rate
– Transfer data on both rising edge and falling edge
– Doubling the peak data rate
10
Memory Technology
Memory Optimizations
• DDR:
– DDR2
• Lower power (2.5 V -> 1.8 V)
• Higher clock rates (266 MHz, 333 MHz, 400 MHz)
– DDR3
• 1.5 V
• 800 MHz
– DDR4
• 1-1.2 V
• 1600 MHz
11
Memory Technology
Memory Optimizations
• GDDR5 is graphics memory based on DDR3
– Achieve 2-5 X bandwidth per DRAM vs. DDR3
• Wider interfaces (32 vs. 16 bit)
• Higher clock rate
– Possible because they are attached via soldering instead
of socketted DIMM modules
• Graphics Data RAMs
– GDRAM/GSDRAM
– Graphics or Graphics Synchronous DRAM
• Reducing power in SDRAMs:
– Lower voltage
– Low power mode (ignores clock, continues to refresh)
12
Memory Technology
Memory Power Consumption
13
Memory Technology
Flash Memory
• Type of EEPROM
• Must be erased (in blocks) before being
overwritten
• Non volatile
• Limited number of write cycles
• Cheaper than SDRAM, more expensive than disk
• Slower than SRAM, faster than disk
14
Memory Technology
Memory Dependability
• Memory is susceptible to cosmic rays
• Soft errors: dynamic errors
– Detected and fixed by error correcting codes (ECC)
• Hard errors: permanent errors
– Use sparse rows to replace defective rows
• Chipkill: a RAID-like error recovery technique
15
Review Virtual Memory
•
•
•
•
•
•
•
Physical memory?
Virtual memory?
Physical address?
Logical address?
Paging virtual memory?
Segmentation virtual memory?
How to translate logical address to physical
address?
• What is write strategy in virtual memory?
16
Review Virtual Memory
• The mapping of a virtual address to a physical
address via a page table
17
Virtual Memory and Virtual Machines
Protection via Virtual Memory
• Protection via virtual memory
– Keeps processes in their own memory space
• Role of architecture:
– Provide user mode and supervisor mode
– Protect certain aspects of CPU state
– Provide mechanisms for switching between
user mode and supervisor mode
– Provide mechanisms to limit memory accesses
– Provide TLB to translate addresses
18
Virtual Memory and Virtual Machines
Protection via Virtual Machines
• Supports isolation and security
• Sharing a computer among many unrelated
users
• Enabled by raw speed of processors, making
the overhead more acceptable
• Allows different ISAs and operating systems to
be presented to user programs
– “System Virtual Machines”
– SVM software is called “VMM, virtual machine
monitor” or “hypervisor”
– Individual virtual machines run under the monitor
are called “guest VMs”
19
Virtual Memory and Virtual Machines
Impact of VMs on Virtual Memory
• Each guest OS maintains its own set of page
tables
– VMM adds a level of memory between physical
and virtual memory called “real memory”
– VMM maintains shadow page table that maps
guest virtual addresses to physical addresses
• Requires VMM to detect guest’s changes to
its own page table
• Occurs naturally if accessing the page table
pointer is a privileged operation
20
阅读作业(第五版)
• 2.6 Putting it all together: Memory hierachies in
the ARM Cortex-A8 and Intel Core i7
• Pp113-135
• http://www.doc88.com/p-112663203506.html
21