1 - 上海大学计算机工程与科学学院

Transcript 1 - 上海大学计算机工程与科学学院

计算机系统结构
-经典理论（1）
上海大学计算机学院
徐炜民
4/13/2015
1
目录
历史基础
 冯氏
 定义
 等级
 系列机
 模拟与仿真
 Amdahl’s Law
 层次结构
 现代组成定义

4/13/2015
P.2-6
P.7-17
P.18-22
P.23
P.24-26
P.27-30
P.100-113
P.120-125
P.126-127
2
布尔逻辑代数
早在1847和1854年，英
国数学家布尔发表了两
部重要著作《逻辑的数
学分析》和《思维规律
的研究》，创立了逻辑
代数。逻辑代数系统采
用二进制，是现代电子
数字计算机的数学和逻
辑基础。
4/13/2015
3
仙农计算机开关电路
1938年，信息论的创
始人、美国科学家仙
农发表论文《继电器
和开关电路的符号分
析》，首次阐述了如
何将布尔代数运用于
逻辑电路，奠定了现
代电子数字计算机开
关电路的理论基础。
4/13/2015
4
阿塔纳索夫计算机三原则
1939年，阿塔纳索夫提出计算机三
原则；采用二进制进行运算；采用
电子技术来实现控制和运算；采用
把计算功能和存储功能相分离的结
构。1939年，阿塔纳索夫还设计并
试制数字电子计算机的样机“ABC
机”，但未能完工。
阿塔纳索夫关于电子计算机的设计
方案启发了ENIAC开发小组的莫克
利，并直接影响到ENIAC的诞生。
1972年美国法院判决ENIAC的专利
权无效，阿塔纳索夫拥有作为第一
个电子计算机方案提出者的优先权。
4/13/2015
5
图林机
4/13/2015
现代通用数字计算机的数学模型
1936年，24岁的英国数学家图林发
表著名论文《论可计算数及其在密
码问题的应用》，提出了“理想计
算机”，后人称之为“图林机”。
图林通过数学证明得出理论上存在
“通用图林机”，这为可计算性的
概念提供了严格的数学定义，图林
机成为现代通用数字计算机的数学
模型，它证明通用数字计算机是可
以制造出来的。图林发表于1940年
的另一篇著名论文《计算机能思考
吗？》，对计算机的人工智能进行
了探索，并设计了著名的“图林测
验”。1954年图林英年早逝，年仅
6
42岁。
维纳的现代计算机设计五原则
1940年，美国科学家维纳阐述了
自己对现代计算机的五点设计原
则：数字式而不是模拟式；以电
子元件构成并尽量减少机械装置；
采用二进制而不是十进制；内部
存放计算表；内部存储数据。
维纳在1948年完成了著作《控制
论》，这不仅使维纳成为控制论
的创始人，而且对计算机后来的
发展和人工智能的研究产生了深
刻的影响。
4/13/2015
7
电子计算机之父

“电子计算机之父”的桂冠，被戴在数学
家冯·诺依曼（J.Von Neumann）头上，
而不是ENIAC的两位实际研究者，这是
因为冯·诺依曼提出了现代电脑的体系结
构。
4/13/2015
8
4/13/2015
9
冯·诺依曼小传-1

冯·诺依曼是本世纪最伟大的科学家之一。他
1913年出生于匈牙利首都布达佩斯，6岁能心
算8位数除法，8岁学会微积分，12岁读懂了函
数论。通过刻苦学习，在17岁那年，他发表了
第一篇数学论文，不久后掌握七种语言，又在
最新数学分支——集合论、泛函分析等理论研
究中取得突破性进展。22岁，他在瑞士苏黎士
联邦工业大学化学专业毕业。一年之后，摘取
布达佩斯大学的数学博士学位。转而攻向物理，
为量子力学研究数学模型，又使他在理论物理
学领域占据了突出的地位
4/13/2015
10
冯·诺依曼小传-2

1928年，美国数学泰斗韦伯伦教授聘请这位26岁的柏林大学
讲师到美国任教，冯·诺依曼从此到美国定居。1933年，他
与爱因斯坦一起被聘为普林斯顿大学高等研究院的第一批终
身教授。数学史界却坚持说，冯·诺依曼是本世纪最伟大的
数学家之一，他在遍历理论、拓扑群理论等方面作出了开创
性的工作，算子代数甚至被命名为“冯·诺依曼代数”。物
理学界表示，冯·诺依曼在30年代撰写的《量子力学的数学基
础》已经被证明对原子物理学的发展有极其重要的价值，而
经济学界则反复强调，冯·诺依曼建立的经济增长模型体系，
特别是40年代出版的著作《博弈论和经济行为》，使他在经
济学和决策科学领域竖起了一块丰碑。 1957年2月8日，
冯·诺依曼因患骨癌逝世于里德医院，年仅54岁。他对电脑科
学作出的巨大贡献，永远也不会泯灭其光辉！
4/13/2015
11
戈德斯坦请教问题

1944年夏，戈德斯坦在阿贝丁车站等候去费城
的火车，偶然邂逅闻名世界的大数学家冯·诺依
曼教授。戈德斯坦抓住机会向数学大师讨教，
冯·诺依曼和蔼可亲，耐心地回答戈德斯坦的提
问。听着听着，他敏锐地从这些数学问题里，
察觉到不寻常事情。他反过来向戈德斯坦发问，
直问得年轻人“好像又经历了一次博士论文答
辩”。最后，戈德斯坦毫不隐瞒地告诉他莫尔
学院的电子计算机项目。
4/13/2015
12
从研制中产生思想
他为阿贝丁试炮场的计算问题焦虑万分。
他希望到莫尔学院看看ENIAC的研制。
 从此，他成为了莫尔小组的实际顾问，
与小组成员频繁地交换意见。年轻人机
敏地提出各种设想，冯·诺依曼则运用他
渊博的学识，把讨论引向深入，并逐步
形成电子计算机的系统设计思想。。

4/13/2015
13
4/13/2015
14
发现问题

在尚未投入运行前，冯·诺依曼就看出这台机
器致命的缺陷，主要弊端是程序与计算两分离。
程序指令存放在机器的外部电路里，需要计算
某个题目，必须首先用人工接通数百条线路，
需要几十人干好几天之后，才可进行几分钟运
算。冯·诺依曼决定起草一份新的设计报告，
对电子计算机进行脱胎换骨的改造。他把新机
器的方案命名为“离散变量自动电子计算机”，
英文缩写是“EDVAC”
4/13/2015
15
4/13/2015
16
著名的“101页报告”

1945年6月，冯 ·诺依曼与戈德斯坦、勃
克斯等人，联名发表了一篇长达101页纸
的报告，即计算机史上著名的“101页报
告”，直到今天，仍然被认为是现代电
脑科学发展里程碑式的文献。报告明确
规定出计算机的五大部件，并用二进制
替代十进制运算。EDVAC方案的革命意
义在于“存储程序”，以便电脑自动依
次执行指令。人们后来把这种“存储程
序”体系结构的机器统称为“诺依曼
机”。
4/13/2015
17
冯·诺依曼结构的特点
• 使用单一处理部件来完成计算、存储及通信
功能；
• 线性组织的定长存储单元（地址）；
• 存储空间的单元是直接寻址的（地址）；
• 使用低级机器语言，其指令完成基本操作码
的简单操作；
• 对计算进行集中的顺序控制（程序存储）。
• 首次提出“地址”和“程序存储”的概念。
4/13/2015
18
计算机系统结构的定义
Amdahl提出：计算机系统结构是从程序设计
者所看到的计算机的属性，即概念性结构和功
能特性。这实际上是计算机系统的外特性。
 从计算机系统的层次结构概念出发，不同级的
程序设计者所看到的计算机属性显然是不一样
的， “系统结构”就是指计算机系统中对各级
之间界面的定义及其上、下的功能分配。
 例：图1-8中M2级：机器语言级计算机。其界
面之上是所有软件功能，界面之下是所有硬件
和固件的功能。

4/13/2015
19
计算机组成-1

计算机组成（Computer Organization）指
计算机系统结构的逻辑实现，包括机器
级内的数据通道和控制信号的组成及逻
辑设计，它着眼于机器级内各时间的时
序方式与控制机构、各部件功能及相互
联系。
4/13/2015
20
计算机组成-2

计算机组成还应包括：数据通路宽度；
根据速度、造价、使用状况设置专用部
件，例如是否设置乘法器、除法器、浮
点运算协处理器、 I/O处理器等；部件共
享和并行执行；控制器结构（组合逻辑、
PLA、微程序）、单处理机或多处理机、
指令先取技术和预估、预判技术应用等
组成方式的选择；可靠性技术；芯片的
集成度和速度的选择。
4/13/2015
21
计算机实现

计算机实现（Computer Implementation）
指计算机组成的物理实现，包括处理机、
主存等部件的物理结构，芯片的集成度
和速度，芯片、模块、插件、底板的划
分与连接，专用芯片的设计，微组装技
术，总线驱动，电源、通风降温、整机
装配技术等，它着眼于芯片技术和组装
技术。
4/13/2015
22
三者之间的关系
计算机系统结构、组成和实现是三个不同的概
念。系统结构是计算机系统的软、硬件界面；
计算机组成是计算机系统结构的逻辑实现；计
算机实现是计算机组成的物理实现。他们各自
有不同的内容，但又有紧密的关系。
 例如：指令系统功能的确定属于系统结构，而
指令的实现，如取指、取操作数、运算、送结
果等具体操作及其时序属于组成，而实现这些
指令功能的具体电路、器件设计及装配技术等
属于实现。

4/13/2015
23
计算机等级与设计思想
计算机等级的发展遵循以下三种不同的设计思想。
（1）在本等级范围内以合理的价格获得尽可能好的
性能，逐渐向高档机发展，称为最佳性能价格比设
计；
（2）只求保持一定的合用的性能而争取最低价格，
称为最低价格设计，其结果往往是从低档向下分化
出新的计算机等级；
（3）以获取最高性能为主要目标而不惜增加价格，
称为最高性能设计，以至于产生当时最高等级计算
机。
4/13/2015
24
系列机概念
先设计一种系统结构（机器属性），而后按这种系统
结构设计它的系统软件，按器件状况和硬件技术研
究这种结构的各种实现方法，并按照速度、价格等
不同要求，分别提供不同速度、不同配置的各挡机
器。（系列机必须保证用户看到的机器属性一致）
例：IBM 360
IBM AS/400
4/13/2015
25
IBM 360 （1964年）

系列中各机型(规模由小到大，功能从弱
到强，包括20、30、40、50、65、75等6
个型号，后来扩充了25、85、91、195等
型号)具有兼容性
4/13/2015
26
系列机的优点
1。在使用共同系统软件的基础上，解决程序的兼容性问题；
2。在统一数据结构和指令系统的基础上，便于组成多机系统和网络；
3。使用标准的总线规程，实现接插件和扩展功能卡的兼容，便于实
现OEM（Original Equipment Manufacture）。
4。扩大计算机应用领域，提供用户在同系列的多种机型内选用最合
适的机器的可能性；
5。有利于机器的使用、维护和人员培训
6。有利于计算机升级换代；
7。有利于提高劳动生产率，增加产量、降低成本、促进计算机的发
展。
4/13/2015
27
模拟与仿真-1
系列机能实现程序移植，其原因在于系列机有
相同的系统结构。如果要求程序能在具有不同
系统结构的机器间相互移植，就要求做到在某
系统结构之上实现另一种系统结构，即实现另
一种机器的属性。
 仿真是用微程序解释，其解释程序在微程序存
储器；模拟是用机器语言程序解释，其解释程
序在主存储器。

4/13/2015
28
模拟与仿真-2
模拟（Simulation）
B虚拟机（Virtual Machine） A
宿主机（Host Machine）
B的一条机器指令用A的一段机器
语言程序去解释执行-->模拟。
4/13/2015
29
模拟与仿真-3
仿真（Emulation）
B目的机（Target Machine）
A宿主机（Host Machine）
B的一条机器指令用A的一段
微程序去解释执行-->仿真。
4/13/2015
30
M5：高级语言
M4以上应用
M4：汇编
M3：OS
M3：OS
M2：机器语言
B 虚拟机
模拟
M2：机器语言
仿真
M1：微程序
A 宿主机
4/13/2015
31
Introduction-1

Computer technology has made incredible progress in
the roughly 60 years since the first general-purpose
electronic computer was created. （发展迅速）
 Today, less than a thousand dollars will purchase a
personal computer that has more performance, more
main memory, and more disk storage than a computer
bought in 1980 for 1 million dollars. （经济性）
 This rapid rate of improvement has come both from
advances in the technology used to build computers
and from innovation （创新）in computer design.
4/13/2015
32
Introduction-2

During the first 25 years of electronic computers,
both forces made a major contribution; but
beginning in about 1970, computer designers
became largely dependent upon integrated
circuit technology（集成电路技术）.
 During the 1970s, performance continued to
improve at about 25% to 30% per year for the
mainframes(主机系统) and minicomputers
（小型机） that dominated the industry.
4/13/2015
33
Introduction-3

The late 1970s saw the emergence of the microprocessor
（微型机）. The ability of the microprocessor to ride
the improvements in integrated circuit technology more
closely than the less integrated mainframes and
minicomputers led to a higher rate of improvement—
roughly 35% growth per year in performance.
 This growth rate, combined with the cost （成本）
advantages of a mass-produced microprocessor, led to an
increasing fraction of the computer business being based
on microprocessors.
4/13/2015
34
Introduction-3

In addition, two significant changes in the computer
marketplace made it easier than ever before to be
commercially successful with a new architecture.
 First, the virtual elimination of assembly language
programming reduced the need for object-code
compatibility. （汇编语言程序设计）
 Second, the creation of standardized, vendorindependent operating systems, such as UNIX and its
clone, Linux, lowered the cost and risk of bringing out a
new architecture. （操作系统）
4/13/2015
35
Introduction-4

These changes made it possible to successfully develop a
new set of architectures, called RISC (Reduced
Instruction Set Computer) （精简指令系统）
architectures, in the early 1980s. The RISC-based
machines focused the attention of designers on two
critical performance techniques, the exploitation of
instruction-level parallelism（指令级并行） (initially
through pipelining（流水线）and later through
multiple instruction issue（多指令发射）) and the use
of caches (initially in simple forms and later using more
sophisticated organizations and optimizations).
4/13/2015
36
Introduction-5

The combination of architectural and
organizational enhancements has led to 20
years of sustained growth in performance at
an annual rate of over 50%.
4/13/2015
37
Introduction-6

Figure 1.1 shows the effect of this
difference in performance growth rates.
4/13/2015
38
Figure 1.1 shows the effect of this difference in
performance growth rates.
4/13/2015
39
Introduction-7

First, it has signifi-cantly enhanced the capability
available to computer users. For many
applications, the highest-performance
microprocessors of today outperform the
supercomputer（超级计算机） of less than 10
years ago.
 Second, this dramatic rate of improvement has
led to the dominance of microprocessor-based
computers across the entire range of the
computer design.
4/13/2015
40
Introduction-8

Workstations（工作站） and PCs have emerged as
major products in the computer industry.
 Minicomputers, which were traditionally made from
off-the-shelf logic or from gate arrays（门阵列）,
have been replaced by servers made using
microprocessors.
 Mainframes have been almost completely replaced with
multiprocessors multiprocessors（多处理器）
multicore （多核）consisting of small numbers of offthe-shelf microprocessors.
4/13/2015
41
Introduction-9

Even high-end supercomputers （高端超级
计算机）are being built with collections of
microprocessors. Freedom from
compatibility with old designs and the use
of microprocessor technology led to a
renaissance in computer design, which
emphasized both architectural innovation
and efficient use of technology
improvements.
4/13/2015
42
Introduction-10

This renaissance is responsible for the higher
performance growth shown in Figure 1.1—a
rate that is unprecedented in the computer
industry. This rate of growth has compounded
so that by 2001, the difference between the
highest-performance microprocessors and
what would have been obtained by relying
solely on technology, including improved
circuit design, was about a factor of 15.
4/13/2015
43
Introduction-11

In the last few years, the tremendous
improvement in integrated circuit capability
has allowed older, less-streamlined
architectures, such as the x86 (or IA-32)
architecture, to adopt many of the
innovations first pioneered in the RISC
designs.
 （用新技术手段改造过时的结构）
4/13/2015
44
Introduction-12

As we will see, modern x86 processors
basically consist of a front end that fetches
and decodes x86 instructions and maps them
into simple ALU（算术逻辑单元）,
memory access（存储器访问）, or branch
operations（分支操作） that can be
executed on a RISC-style pipelined processor.
4/13/2015
45
Introduction-13

Beginning in the late 1990s, as transistor counts
soared（晶体管数量的迅猛增长）, the
overhead (in transistors) of interpreting the
more complex x86 architecture became
negligible（微不足道的） as a percentage of
the total transistor count of a modern
microprocessor.
4/13/2015
46
Introduction-14

The architectural ideas and accompanying
compiler improvements that have made this
incredible growth rate possible.
 The dramatic revolution has been the
development of a quantitative approach （定量
方法）to computer design and analysis that uses
empirical observations (经验观察能力力)of
programs, experimentation, and simulation as its
tools（工具）.
4/13/2015
47
Introduction-15

Sustaining the recent improvements in cost and
performance （性能和价格）will require
continuing innovations in computer design.
 We believe such innovations will be founded
on this quantitative approach to computer
design. (我们相信这种创新是建立在对计算
机设计的定量探求上的)
4/13/2015
48
Introduction
The Changing Face of Computing

In the 1960s, the dominant form of
computing was on large mainframes（大
型主机）—machines costing millions of
dollars and stored in computer rooms with
multiple operators overseeing their support.
Typical applications included business data
processing （商务数据处理）and largescale scientific computing(大规模科学计
算).
4/13/2015
49
Introduction
The Changing Face of Computing

The 1970s saw the birth of the
minicomputer, a smaller-sized machine
initially focused on applications in scientific
laboratories, but rapidly branching out as
the technology of time-sharing（分时）—
multiple users（多用户） sharing a
computer interactively through independent
terminals（独立终端）—became
widespread.
4/13/2015
50
Introduction
The Changing Face of Computing

The 1980s saw the rise of the desktop
computer（台式机） based on
microprocessors, in the form of both personal
computers and workstations.
 The individually owned desktop computer
replaced time-sharing and led to the rise of
servers（服务器）— computers that provided
larger-scale services such as reliable, longterm
file storage and access, larger memory, and
more computing power（计算能力）.
4/13/2015
51
Introduction
The Changing Face of Computing

The 1990s saw the emergence of the
Internet and the World Wide Web, the first
successful handheld computing devices
(personal digital assistants or PDAs), and
the emergence of high-performance digital
consumer electronics, from video games to
set-top boxes（机顶盒）.
4/13/2015
52
Introduction
The Changing Face of Computing

Not since the creation of the personal computer
more than 20 years ago have we seen such
dramatic changes in the way computers appear
and in how they are used.
 These changes in computer use have led to three
different computing markets（计算市场）
（ desktop computing ， servers ， Embedded
computers ）, each characterized by different
applications（应用）, requirements（需求）,
and computing technologies（计算技术）.
4/13/2015
53
Introduction
Changing Face for Desktop Computing

The first, and still the largest market in
dollar terms, is desktop computing. Desktop
computing spans from low-end systems that
sell for under $1000 to high-end, heavily
configured workstations that may sell for
over $10,000. Throughout this range in
price and capability, the desktop market
tends to be driven to optimize priceperformance.
4/13/2015
54
Introduction
Changing Face for Desktop Computing

This combination of performance (measured
primarily in terms of compute performance and
graphics performance) and price of a system is
what matters most to customers in this market,
and hence to computer designers.
 As a result, desktop systems often are where the
newest, highest-performance microprocessors
appear, as well as where recently cost-reduced
microprocessors and systems appear first.
4/13/2015
55
Introduction
Changing Face for Desktop Computing

Desktop computing also tends to be reasonably well
characterized in terms of applications and benchmarking,
though the increasing use of Web-centric, interactive
applications poses new challenges in performance
evaluation.
 The PC portion of the desktop space seems recently to
have become focused on clock rate as the direct measure
of performance, and this focus can lead to poor decisions
by consumers as well as by designers who respond to this
predilection.
4/13/2015
56
Introduction
Changing Face for Servers

As the shift to desktop computing occurred, the
role of servers to provide larger-scale and more
reliable file and computing services grew. The
emergence of the World Wide Web accelerated
this trend because of the tremendous growth in
demand for Web servers and the growth in
sophistication of Web-based services. Such
servers have become the backbone of large-scale
enterprise computing, replacing the traditional
mainframe.
4/13/2015
57
Introduction
Changing Face for Servers

For servers, different characteristics are important.
First, availability is critical.
 The term “availability（有效性）,” which means
that the system can reliably and effectively
provide a service. This term is to be distinguished
from “reliability,” which says that the system
never fails. Parts of large-scale systems
unavoidably fail; the challenge in a server is to
maintain system availability in the face of
component failures, usually through the use of
redundancy.
4/13/2015
58
Introduction
Changing Face for Servers

Why is availability crucial? Consider the servers
running Yahoo!, taking orders for Cisco, or
running auctions on eBay. Obviously such systems
must be operating seven days a week, 24 hours a
day. Failure of such a server system is far more
catastrophic than failure of a single desktop.
Although it is hard to estimate the cost of
downtime, Figure 1.2 shows one analysis,
assuming that downtime is distributed uniformly
and does not occur solely during idle times.
4/13/2015
59
Introduction
Changing Face for Servers
 As
we can see, the estimated costs of
an unavailable system are high, and the
estimated costs in Figure 1.2 are purely
lost revenue and do not account for the
cost of unhappy customers!
4/13/2015
60
Introduction
Changing Face for Servers

A second key feature of server systems is an
emphasis on scalability(可扩展性). Server
systems often grow over their lifetime in
response to a growing demand for the
services they support or an increase in
functional requirements. Thus, the ability to
scale up the computing capacity, the
memory, the storage, and the I/O bandwidth
of a server is crucial.
4/13/2015
61
Introduction
Changing Face for Servers

Lastly, servers are designed for efficient
throughput（吞吐量）. That is, the overall
performance of the server—in terms of
transactions（交互） per minute or Web
pages served per second—is what is crucial.
Responsiveness to an individual request
remains important, but overall efficiency and
cost-effectiveness, as determined by how
many requests can be handled in a unit time,
are the key metrics for most servers.
4/13/2015
62
Introduction
Changing Face for Embedded Computers

Embedded computers(嵌入式计算机)—computers
lodged in other devices where the presence of the
computers is not immediately obvious—are the fastest
growing portion of the computer market.
 These devices range from everyday machines (most
microwaves, most washing machines, most printers,
most networking switches, and all cars contain simple
embedded microprocessors) to handheld digital devices
(such as palmtops, cell phones, and smart cards) to
video games and digital set-top boxes.
4/13/2015
63
Introduction
Changing Face for Embedded Computers

Although in some applications (such as
palmtops) the computers are programmable,
in many embedded applications the only
programming occurs in connection with the
initial loading of the application code or a
later software upgrade of that application.
Thus, the application can usually be
carefully tuned for the processor and system.
4/13/2015
64
Introduction
Changing Face for Embedded Computers

This process sometimes includes limited use of
assembly language（汇编语言） in key loops,
although time-to-market pressures and good
software engineering practice usually restrict such
assembly language coding to a small fraction of
the application. This use of assembly language,
together with the presence of standardized
operating systems（标准化操作系统）, and a
large code base has meant that instruction set
compatibility（指令系统兼容性） has become an
important concern in the embedded market.
 Simply put, like other computing applications,
software costs are often a large part of the total
4/13/2015
65
cost of an embedded system.
Introduction
Changing Face for Embedded Computers


Embedded computers have the widest range of processing power
and cost—from low-end （低端）8-bit and 16-bit processors that
may cost less than a dollar, to full 32-bit microprocessors capable
of executing 50 million instructions per second that cost under 10
dollars, to high-end （高端） embedded processors that cost
hundreds of dollars and can execute a billion instructions per
second for the newest video game or for a high-end network
switch.
Although the range of computing power in the embedded
computing market is very large, price is a key factor in the design
of computers for this space. Performance requirements do exist, of
course, but the primary goal is often meeting the performance
need at a minimum price, rather than achieving higher
performance at a higher price.
4/13/2015
66
Introduction
Changing Face for Embedded Computers



Often, the performance requirement in an embedded application
is a real-time （实时）requirement. A real-time performance
requirement is one where a segment of the application has an
absolute maximum execution time that is allowed.
For example, in a digital set-top box the time to process each
video frame （视帧）is limited, since the processor must accept
and process the next frame shortly.
In some applications, a more sophisticated requirement exists: the
average time for a particular task is constrained as well as the
number of instances when some maximum time is exceeded. Such
approaches (sometimes called soft real-time ) arise when it is
possible to occasionally miss the time constraint on an event, as
long as not too many are missed.
4/13/2015
67
Introduction
Changing Face for Embedded Computers

Real-time performance tends to be highly
application dependent. It is usually measured
（测量）using either from the application or
from a standardized benchmark（标准化评
测） .With the growth in the use of embedded
microprocessors, a wide range of benchmark
kernels requirements exist, from the ability to
run small, limited code segments to the ability
to perform well on applications involving tens
to hundreds of thousands of lines of code.
4/13/2015
68
Introduction
Changing Face for Embedded Computers

Two other key characteristics exist in many
embedded applications: the need to minimize
memory（最小化存储器） and the need to
minimize power （最小化功耗）.
 Although the emphasis on low power is
frequently driven by the use of batteries（电
池）, the need to use less expensive packaging
(plastic versus ceramic) and the absence of a fan
（风扇） for cooling（冷却） also limit total
power
consumption.
4/13/2015
69
Introduction
Changing Face for Embedded Computers

In many embedded applications, the memory can be a
substantial portion of the system cost, and it is important
to optimize memory size in such cases. Sometimes the
application is expected to fit totally in the memory on
the processor chip; other times the application needs to
fit totally in a small off-chip memory. In any event, the
importance of memory size translates to an emphasis on
code size, since data size is dictated by the application.
Some architectures have special instruction set
capabilities to reduce code size. Larger memories also
mean more power, and optimizing power is often critical
in embedded applications.
4/13/2015
70
Introduction
Changing Face for Embedded Computers

Another important trend in embedded systems is
the use of processor cores together with
application-specific circuitry（专用电路芯片）.
 Often an application’s functional and
performance requirements are met by combining
a custom hardware solution（用户硬件解决方
案） together with software running on a
standardized embedded processor core, which is
designed to interface to such special-purpose
hardware.
4/13/2015
71
Introduction
Changing Face for Embedded Computers

In practice, embedded problems are usually solved by
one of three approaches:
1. The designer uses a combined hardware/software
solution that includes some custom hardware and an
embedded processor core that is integrated with the
custom hardware, often on the same chip.
2. The designer uses custom software running on an offthe-shelf（通用） embedded processor.
3. The designer uses a digital signal processor (DSPs) and
custom software for the processor. Digital signal
processors are processors specially tailored for signalprocessing applications.
4/13/2015
72
The Task of the Computer Designer-1

The task the computer designer faces is a complex
one: Determine what attributes （属性）are
important for a new machine, then design a
machine to maximize performance while staying
within cost and power constraints.
 This task has many aspects, including instruction
set design, functional organization, logic design,
and implementation.
4/13/2015
73
The Task of the Computer Designer-2

The implementation may encompass
integrated circuit design, packaging（封
装）, power, and cooling. Optimizing the
design requires familiarity with a very wide
range of technologies, from compilers and
operating systems to logic design and
packaging.
4/13/2015
74
The Task of the Computer Designer-3
 In
the past, the term computer architecture
often referred only to instruction set design.
Other aspects of computer design were
called implementation（实现）, often
insinuating that implementation is
uninteresting or less challenging. We
believe this view is not only incorrect, but
is even responsible for mistakes in the
design of new instruction sets.
4/13/2015
75
The Task of the Computer Designer-4
 The
architect’s or designer’s job is much
more than instruction set design, and the
technical hurdles in the other aspects of
the project are certainly as challenging as
those encountered in instruction set design.
This challenge is particularly acute at the
present, when the differences among
instruction sets are small and when there
are three rather distinct application areas.
4/13/2015
76
The Task of the Computer Designer-5

The implementation of a machine has two
components: organization and hardware.

The term organization includes the high-level
aspects of a computer’s design, such as the
memory system, the bus structure, and the design
of the internal CPU (where arithmetic, logic,
branching, and data transfer are implemented).
4/13/2015
77
The Task of the Computer Designer-6

For example, two embedded processors with
identical instruction set architectures but very
different organizations are the NEC VR 5432
and the NEC VR 4122. Both processors
implement the MIPS64 instruction set, but
they have very different pipeline and cache
organizations. In addition, the 4122
implements the floating-point instructions in
software rather than hardware!
4/13/2015
78
The Task of the Computer Designer-7

Hardware is used to refer to the specifics of a
machine, including the detailed logic design and
the packaging technology of the machine.

Often a line of machines contains machines with
identical instruction set architectures and nearly
identical organizations, but they differ in the
detailed hardware implementation.
4/13/2015
79
The Task of the Computer Designer-8

For example, the Pentium II and Celeron
are nearly identical, but offer different clock
rates and different memory systems, making
the Celeron more effective for low-end
computers.
 Term architecture is intended to cover all
three aspects of computer design—
instruction set architecture, organization,
and hardware.
4/13/2015
80
The Task of the Computer Designer-9

Computer architects must design a computer to
meet functional requirements as well as price,
power, and performance goals.

Often, they also have to determine what the
functional requirements are, which can be a major
task. The requirements may be specific features
inspired by the market.
4/13/2015
81
The Task of the Computer Designer-10

Application software often drives the choice of
certain functional requirements by determining
how the machine will be used. If a large body of
software exists for a certain instruction set
architecture, the architect may decide that a new
machine should implement an existing instruction
set.
4/13/2015
82
The Task of the Computer Designer11

The presence of a large market for a
particular class of applications might
encourage the designers to incorporate
requirements that would make the machine
competitive in that market. Figure 1.4
summarizes some requirements that need to
be considered in designing a new machine.
Many of these requirements and features
will be examined in depth in later chapters.
4/13/2015
83
The Task of the Computer Designer-12

Once a set of functional requirements has been
established, the architect must try to optimize（优化）
the design. Which design choices are optimal depends,
of course, on the choice of metrics. The changes in the
computer applications space over the last decade have
dramatically changed the metrics. Although desktop
computers remain focused on optimizing costperformance（性能-价格） as measured by a single
user, servers focus on availability, scalability, and
throughput cost-performance, and embedded computers
are driven by price and often power issues.
4/13/2015
84
The Task of the Computer Designer-13

These differences and the diversity and size of
these different markets lead to fundamentally
different design efforts.

For the desktop market, much of the effort goes
into designing a leading-edge microprocessor and
into the graphics and I/O system that integrate
with the microprocessor.
4/13/2015
85
The Task of the Computer Designer-14

In the server area, the focus is on integrating
state-of-the-art microprocessors, often in a
multiprocessor architecture, and designing
scalable and highly available I/O systems to
accompany the processors.
4/13/2015
86
The Task of the Computer Designer-15

In the embedded processor market, the
challenge lies in adopting the high-end
microprocessor techniques to deliver most
of the performance at a lower fraction of the
price, while paying attention to demanding
limits on power and sometimes a need for
high-performance graphics or video
processing.
4/13/2015
87
The Task of the Computer Designer-16

In addition to performance and cost,
designers must be aware of important trends
in both the implementation technology and
the use of computers. Such trends not only
impact future cost, but also determine the
longevity of an architecture.
4/13/2015
88
Technology Trends-1

If an instruction set architecture is to be
successful, it must be designed to survive
rapid changes in computer technology.
After all, a successful new instruction set
architecture may last decades—the core of
the IBM mainframe has been in use for
more than 35 years. An architect must plan
for technology changes that can increase the
lifetime of a successful computer.
4/13/2015
89
Technology Trends-2

To plan for the evolution of a machine, the
designer must be especially aware of rapidly
occurring changes in implementation
technology.

Four implementation technologies, which
change at a dramatic pace, are critical to
modern implementations:
4/13/2015
90
Technology Trends-3

1. Integrated circuit logic technology—
Transistor density(密度) increases by about 35%
per year, quadrupling in somewhat over four
years. Increases in die size （模板大小）are less
predictable and slower, ranging from 10% to 20%
per year. The combined effect is a growth rate in
transistor count on a chip （片上晶体管数）of
about 55% per year. Device speed scales（速率）
more slowly.
4/13/2015
91
Technology Trends-4
2. Semiconductor DRAM (dynamic randomaccess memory)—Density increases by
between 40% and 60% per year, quadrupling
in three to four years. Cycle time has
improved very slowly, decreasing by about
one-third in 10 years. Bandwidth(带宽) per
chip increases about twice as fast as latency
decreases. In addition, changes to the DRAM
interface have also improved the bandwidth.
4/13/2015
92
Technology Trends-5
3. Magnetic disk technology —Recently, disk
density has been improving by more than 100%
per year, quadrupling in two years. Prior to 1990,
density increased by about 30% per year,
doubling in three years. It appears that disk
technology will continue the faster density
growth rate for some time to come. Access time
(访问时间)has improved by one-third in 10
years.
4/13/2015
93
Technology Trends-6
4. Network technology —Network performance depends both on
the performance of switches and on the performance of the
transmission system.
Both latency（延迟） and bandwidth（带宽） can be improved,
though recently bandwidth has been the primary focus. For many
years, networking technology appeared to improve slowly: for
example, it took about 10 years for Ethernet technology to move
from 10 Mb to 100 Mb. The increased importance of networking
has led to a faster rate of progress, with 1 Gb Ethernet becoming
available about five years after 100 Mb. The Internet
infrastructure in the United States has seen even faster growth
(roughly doubling in bandwidth every year), both through the use
of optical media（光介质） and through the deployment of much
more switching hardware.
4/13/2015
94
Technology Trends-7

These rapidly changing technologies impact the design
of a microprocessor that may, with speed and
technology enhancements, have a lifetime of five or
more years. Even within the span of a single product
cycle for a computing system (two years of design and
two to three years of production), key technologies,
such as DRAM, change sufficiently that the designer
must plan for these changes. Indeed, designers often
design for the next technology, knowing that when a
product begins shipping in volume that next technology
may be the most costeffective or may have
performance advantages. Traditionally, cost has
decreased at about the rate at which density increases.
4/13/2015
95
Technology Trends-8
 Although
technology improves fairly
continuously, the impact of these
improvements is sometimes seen in discrete
leaps, as a threshold that allows a new
capability is reached. .
4/13/2015
96
Technology Trends-9

For example, when MOS technology reached the point
where it could put between 25,000 and 50,000
transistors on a single chip in the early 1980s, it became
possible to build a 32-bit microprocessor on a single
chip. By the late 1980s, first-level caches could go on
chip. By eliminating chip crossings within the processor
and between the processor and the cache, a dramatic
increase in cost-performance and performance/power
was possible. This design was simply infeasible until the
technology reached a certain point. Such technology
thresholds are not rare and have a significant impact on
a wide variety of design decisions.
4/13/2015
97
Cost, Price, and Their Trends-1

Although there are computer designs where costs tend to
be less important—specifically supercomputers—costsensitive designs are of growing significance: More than
half the PCs sold in 1999 were priced at less than $1000,
and the average price of a 32-bit microprocessor for an
embedded application is in the tens of dollars. Indeed, in
the past 15 years, the use of technology improvements
to achieve lower cost, as well as increased performance,
has been a major theme in the computer industry.
4/13/2015
98
Cost, Price, and Their Trends-2

Textbooks often ignore the cost half of costperformance because costs change, thereby
dating books, and because the issues are subtle
and differ across industry segments. Yet an
understanding of cost and its factors is essential
for designers to be able to make intelligent
decisions about whether or not a new feature
should be included in designs where cost is an
issue.
4/13/2015
99
Cost, Price, and Their Trends-3

We focuses on cost and price, specifically on the
relationship between price and cost: price is
what you sell a finished good for, and cost is the
amount spent to produce it, including overhead.
We also discuss the major trends and factors that
affect cost and how it changes over time. The
exercises and examples use specific cost data
that will change over time, though the basic
determinants of cost are less time sensitive.
4/13/2015
100
An Example
Cost, Price, and Their Trends-3
System
Cabinet
Processor board
I/O devices
4/13/2015
Software
Subsystem
Fraction of total
Sheet metal, plastic
2%
Power supply, fans
2%
Cables, nuts, bolts
1%
Shipping box, manuals
1%
Subtotal
6%
Processor
22%
DRAM(128MB)
5%
Video card
5%
Motherboard with basic I/O support, networking
5%
Subtotal
37%
Keyboard and mouse
3%
Monitor
19%
Hard disk(20GB)
9%
DVD drive
6%
Subtotal
37%
101
OS + Basic Office Suite
20%
Amdahl’s Law

1、Make the common case fast (加快经常
性事件的速度)
Improving the frequent event, rather than
the rare event, will obviously help
performance.
4/13/2015
102
Amdahl’s Law

Amdahl’s Law states that the performance
improvement to be gained from using some
faster mode of execution is limited by the
fraction（部分、比例） of the time the
faster mode can be used.
该定律表示：系统中某一部件由于采用某种
更快的执行方式后整个系统性能的提高与这
种执行方式的使用频率或占总执行时间的比
例有关。
4/13/2015
103
Amdahl’s Law
使用频度：应用对象、统计手段
 改进使用频度最高的部件，可获得最大
的效率
 形式化描述的主要指标——加速比

4/13/2015
104
Performance for entire task
Speedup＝ using the enhancement when
possible
Performance for entire task
without using the enhancement
Execution time for entire task
Speedup＝ without using the enhancement
Execution time for entire task
using the enhancement when
possible
4/13/2015
105
Speedup（加速比）

加速比=（采用改进措施后的性能）/
（没有采用改进措施前的性能）
= （没有采用改进措施前执行某任
务的时间）/
（采用改进措施后执行某任务的
时间）
4/13/2015
106
two factors-1
1. The fraction of the computation time in the
original machine that can be converted to
take advantage of the enhancement。
 计算机执行某个任务的总时间中可被改进部
分的时间所占的百分比。
 For example, if 20 seconds of the execution
time of a program that takes 60 seconds in
total can use an enhancement, the fraction is
20/60. This value, which we will call
Fractionenhanced, is always less than or equal to
1.
4/13/2015
107

two factors-2
2. The improvement gained by the enhanced
execution mode; that is, how much faster the
task would run if the enhanced mode were
used for the entire program.
 改进部分采用改进措施后比没有采用改进措施
前性能提高倍数。
 For example, if 20 seconds of the execution
time of a program that takes 60 seconds in
total can use an enhancement, the fraction is
20/60. We will call this value, which is always
greater than 1, Speedupenhanced.

4/13/2015
108
The execution time using the original machine with
the enhanced mode will be the time spent using the
unenhanced portion of the machine plus the time
spent using the enhancement.
4/13/2015
109
Example 1：

Suppose that we are considering an enhancement
to the processor of a server system used for Web
serving. The new CPU is 10 times faster on
computation in the Web serving application than
the original processor. Assuming that the original
CPU is busy with computation 40% of the time and
is waiting for I/O 60% of the time, what is the
overall speedup gained by incorporating the
enhancement?
 Answer
Fractionenhanced = 0.4
Speedupenhanced = 10
4/13/2015
110
Example 2：

A common transformation required in graphics engines is
square root. Implementations of floating-point (FP) square root
vary significantly in performance, especially among processors
designed for graphics. Suppose FP square root (FPSQR) is
responsible for 20% of the execution time of a critical graphics
benchmark.One proposal is to enhance the FPSQR hardware
and speed up this operation by a factor of 10. The other
alternative is just to try to make all FP instructions in the
graphics processor run faster by a factor of 1.6; FP
instructions are responsible for a total of 50% of the execution
time for the application. The design team believes that they
can make all FP instructions run 1.6 times faster with the
same effort as required for the fast square root. Compare
these
4/13/2015two design alternatives.
111

Answer：
4/13/2015
112
3、The CPU Performance Equation
（CPU性能公式）
CPU time =CPU clock cycles for a
program（CPU时钟周期总数） ×Clock
cycle time（时钟周期）

时钟频率
4/13/2015
113
三要素
CPU性能取决于3个要素：
 Clock cycle time—Hardware technology and
organization

clock cycles per instruction(CPI)—Organization
and instruction set architecture

Instruction count（IC）—Instruction set
architecture and compiler technology
4/13/2015
114
CPU time

CPU time =Instruction count ×Clock
cycle time ×Cycles per instruction
4/13/2015
115
4/13/2015
116
Example
Suppose we have made the following measurements:
 Frequency of FP operations (other than FPSQR) =
25%
 Average CPI of FP operations = 4.0
 Average CPI of other instructions = 1.33
 Frequency of FPSQR= 2%
 CPI of FPSQR = 20
Assume that the two design alternatives are to
decrease the CPI of FPSQR to 2 or to decrease
the average CPI of all FP operations to 2.5.
Compare these two design alternatives using the
CPU
4/13/2015 performance equation.
117
=2-(4.0-2.5) ×25%=1.625
4/13/2015
118
局部性原理

4、程序访问的局部性原理
经统计：一段时间90％的时间去执行10％
的程序代码，即大部分时间是访问程序
的局部空间。
程序访问的局部性是构建存储体系和建立
Cache的理论基础。
4/13/2015
119
控制流程的实现方法
一个信息的处理过程可用控制流程的概念来描
述，常用的实现方法有三种：
1。全硬件的方法，即用组合逻辑设计方法设计
硬件逻辑线路实现控制流程；
2。硬件与软件相结合的方法，即部分流程由微
程序实现，而另一部分由硬件逻辑实现；
3。全软件的方法，即用某种语言，按流程算法
编制程序实现控制流程。

4/13/2015
120
计算机系统的层次结构
• 描述控制流程的，有一定规则的字符集合的“计算
机语言”。
• 计算机语言并不专属软件范畴，它可以分属计算机
系统的各个层次，分别对该层次的控制流程进行描
述。
• 基于对语言广义的理解，可以把计算机系统看成由
多级“虚拟”计算机所组成。从内向外，层层相套，
形成“洋葱”式结构的功能模型。（见图1-6）
例：用户--建模--应用程序--高级语言--汇编语言-操作系统--机器语言--微程序--硬布线逻辑
4/13/2015
121
4/13/2015
122
虚拟计算机的概念
洋葱模型的每一层都是一个虚拟计算机，它只
对“观察者”而存在，它的功能体现在广义语
言上，对该语言提供解释手段，然后作用在信
息处理或控制对象上，并从对象上获得必要的
状态信息。从某一层次的观察者看来，他只能
是通过该层次的语言来了解和使用计算机，至
于内部任何工作和实现是不必关心的。
（即：虚拟计算机是由软件实现的机器）
虚拟计算机的组成，（见图1-7）
用虚拟计算机观点定义的计算机系统的功能层次，（见
图1-8）
4/13/2015
123
4/13/2015
124
4/13/2015
125
4/13/2015
126
现代计算机组成
现代计算机是一种包括机器硬件、指令
系统、系统软件、应用程序和用户接口
的集成系统。各种求解方法可能需要不
同的计算资源，这与求解问题的性质有
关。
4/13/2015
127
4/13/2015
128

1 - 上海大学计算机工程与科学学院

Transcript 1 - 上海大学计算机工程与科学学院

Directory