Jan 26 - Zhang Penghui - Jan 25, 2015 622 PM

Download Report

Transcript Jan 26 - Zhang Penghui - Jan 25, 2015 622 PM

Lecture 2-Berkeley RISC
Penghui Zhang
Guanming Wang
Hang Zhang
1. What Is RISC?
1.1 RISC idea
 developed from the realization that the vast majority of
programs did not use the vast majority of a processor’s
instructions.
 including only those instructions that were really used
 using the space that had been used for the removed
circuitry for other circuits that would speed the system
up instead.
1. What Is RISC?
1.2 How RISC achieves its goal
 adding many more registers
 small bits of memory hold temporary values that can be
accessed at negligible cost
 the speed of the processor would be more closely
defined by its clock speed
1. What Is RISC?
1.3 Comparison Between RISC and CISC
2. RISC I
2.1 RSIC I Design Goals
 High-level language programming
 Cost-effective system in both hardware and software
 Simple, one-word(32-bits) long
 “cost” of each statement type
2. RISC I
2.2 RISC I Architechture
 31 instructions in a few similar formats, all 32 long
 Execution time
 Instructions between registers and memory
2. RISC I
2.3 Micro-architechture of RISC I
 Instruction executions pattern
1. Read two register
2. 2. Perform an operation on them
3. Store the result
EX. Data-Path of RISC I Chip
2. RISC I
2.4 Design environment of RISC I
 UNIX environment on a VAX 11/780
 Regular parts of the chip
 Control section
3. RISC II
3.1 Background
RISC II microprocessor
• Meets the requirements by the code analysis.
• The majority of the chip is occupied by the data unit.
• Unlike normal microprocessors were dominated by
control.
• Majority of the data unit consists a huge file of
registers — 138 of them.
3. RISC II
• 3.1 Background
• RISC work at Berkeley had turned to the new Blue design
from Gold Design.
• The savings due to the new design were tremendous.
• Gold contained
78 registers in 6 windows.
• Blue contained
138 registers 8 windows of 16 registers each another 10
globals.
• The final Blue design, fabbed as RISC II, implemented all of
the RISC instruction set with only 39,000 transistors.
3. RISC II
The RISC II register file
3. RISC II
3.2 Difference
 The key difference was simpler cache circuitry that
eliminated one line per bit
 The other major change was to include an "instructionformat expander“
 RISC II proved to be much more successful in silicon and
in testing outperformed almost all minicomputers on
almost all tasks.
3. RISC II
3.3 Architecture Of RISC II
 It is the evolution of the RISC I design.
 Reading is accomplished by selectively discharging one
of the two precharged bit Line busses
 RISC was design based two-bus and two port register
cell.
 The RISC II architecture used a two-stage pipeline.
3. RISC II
Data Path of RISC II
3. RISC II
3.4 Implementation
 Three Machine Cycles:
 Instruction fetch and decode.
 Register read, operate, and temporary latching of result.
 Write result back into the register file.
 These three cycles are overlaped
 New instruction begins every machine cycle.
 Except for Load and Store instructions.
4. Architectural inheritance
Features used
 A load-store architecture
 Fixed-length 32-bit instructions
 3-address instruction formats
4. Architectural inheritance
4.2 Features rejected
 4.2.1Register windows
 The register banks on the Berkeley RISC processors
incorporated a large number of registers, 32 of which were
visible at any time
 Procedure entry and exit instructions moved the visible
‘window’ to give each procedure access to new registers
 The principal problem with register windows is the large
chip area occupied by the large number of registers
 This feature was therefore rejected on cost grounds
4. Architectural inheritance
4.2 Features rejected
 4.2.2 Delayed branches
 Branches cause pipelines problems since they interrupt the
smooth flow of instructions
 Most RISC processors ameliorate the problem by using
delayed branches where the branch takes effect after the
following instruction has executed
 On the original ARM delayed branches were not used
because they made exception handling more complex
 In the long run this has turned out to be a good decision
since it simplifies re-implementing the architecture with a
different pipeline
4. Architectural inheritance
4.2 Features rejected
 4.2.3 Single-cycle execution of all instruction
 Although the ARM executes most data processing
instructions in a single clock cycle, many other instructions
take multiple clock cycles
 Single cycle operation of all instructions is only possible with
separate data and instruction memories, which were
considered too expensive for the intended ARM application
areas
 The ARM was designed to use the minimum number of
cycles required for memory access