Recovery-Oriented Computing

Download Report

Transcript Recovery-Oriented Computing

Motivations and Introduction
• Phenomenal growth in computer
industry/technology:
X2/18mo in 20yr. multi-GFLOPs
processors, largely due to
–Micro-electronics technology
–Computer Design innovations
• We have come a long way in a short
time of 56 years since the 1st
general purpose computer in 1946:
Slide 1
Motivations and Introduction
Past (Milestones):
– First electronic computer ENIAC in 1946: 18,000 vacuum
tubes, 3,000 cubic feet, 20 2-foot 10-digit registers, 5
KIPs (thousand additions per second);
– First microprocessor (a CPU on a single IC chip) Intel
4004 in 1971: 2,300 transistors, 60 KIPs, $200;
– Virtual elimination of assembly language programming
reduced the need for object-code compatibility;
– The creation of standardized, vendor-independent
operating systems, such as UNIX and its clone, Linux,
lowered the cost and risk of bringing out a new
architecture
– RISC instruction set architecture paved ways for drastic
design innovations that focused on two critical
performance techniques: instruction-level parallelism
and use of caches
Slide 2
Motivations and Introduction
Present (State of the art):
– Microprocessors approaching/surpassing 10 GFLOPS;
– A high-end microprocessor (<$10K) today is easily more
powerful than a supercomputer (>$10million) ten years
ago;
– While technology advancement contributes a sustained
annual growth of 35%, innovative computer design
accounts for another 25% annual growth rate  a factor
of 15 in performance gains!
Slide 3
Motivations and Introduction
Present (State of the art):
– Three different computing markets (fig. 1.3):
» Desktop Computing –- driven by priceperformance (a few hundreds through over 10K);
» Servers – availability driven (distinguished
from reliability), providing sustained high
performance (fig. 1.2)
» Embedded Computers – fastest growing portion of
the computer market, real-time performance
driven, and need to minimize memory and power,
as well as ASIC
Slide 4
Motivations and Introduction
Present (State of the art):
– The Task of the Computer Designer (Fig. 1.4):
» Instruction Set Architecture (Traditional view of
what Computer Architecture is), the boundary between
software and hardware;
» Organization, high-level aspects of a computer’s
design, such as the memory system, the bus structure,
the internal design of CPU, based on a given
instruction set architectrue;
» Hardware, the specifics of a machine, including the
detailed logic design and the packaging technology of
the machine.
Future (Technology Trends):
– A truly successful instruction set architecture (ISA)
should last for decades, however it takes an computer
architect’s acute observation and knowledge of the
rapidly changing technology, in order for the ISA to
survive and cope with such changes:
Slide 5
Motivations and Introduction
Future (Technology Trends):
» IC logic technology: transistor count on a chip grows at
55% annual rate (35% density growth rate + 10-20% die size
growth) while device speed scales more slowly;
» Semiconductor DRAM: density grows at 60% annually while
cycle time improves very slowly (decreasing one-third in
ten years). Bandwidth per chip increases twice as fast as
latency decreases;
» Magnetic dish technology: density increases at 100% annual
rate since 1990 while access time improves at about a third
every ten years; and
» Network technology: both latency and bandwidth have been
improving, with more focus on bandwidth of late; the
increasing importance of networking has led to faster
improvement in performance than before—Internet bandwidth
doubles every year in the U.S.
» Scaling of transistor performance: while transistor density
increases quadratically with linear decrease in feature
size, transistor performance increases roughly linearly
with decrease in feature sizechallenge & opportunity for
computer designer!
» Wires and power in IC: propagation delay and power needs?
Slide 6
Motivations and Introduction
• Cost, Price and Their Trends:
– Understanding cost and pricing structure of the
industry and market is key to cost-sensitive
design of computers;
– The Learning Curve: manufacturing costs decrease
over time (Fig.1.5&1.6), best measured by change
in yield  helps project costs over product’s
life;
Slide 7
Motivations and Introduction
• Cost, Price and Their Trends:
– Cost of an IC (Fig. 1.8):
Cost _ IC 
Cost _ dieCost _ testing _ dieCost _ packaging& final _ test
Final _ test _ yield
Cost _ die 
Cost _ wafer
Die _ per _ wafer Die _ yield
Slide 8
Motivations and Introduction
• Cost, Price and Their Trends:
– Cost of an IC: die yield has been obtained
empirically, where ά corresponds inversely to the
number of masking levels (manufacturing
complexity). For today’s metal CMOS processes,
it’s estimated at 4.0
 (Wafer _ diameter / 2) 2  Wafer _ diameter
Die _ per _ wafer 

Die _ area
2  Die _ area
Die _ yield  Wafer _ yield  (1 
Defects _ per _ unit _ areaDie _ area

) 
Slide 9
Motivations and Introduction
• Distribution of Cost in a System
• Cost vs. Price (Fig. 1.10)
Slide 10
Motivations and Introduction
• Cost vs. Price (Fig. 1.10)
– Component cost(CC): original cost from a
designer’s point of view;
– Direct cost (DC, 20% of CC): making a product
(labor cost, scrap, warranty, etc), not including
service and maintenance;
– Gross margin (GM, 33% of CC+DC): indirect cost 
overhead: R&D, marketing, sales, manufacturing
equipment maintenance, building rental, cost of
financing, pretax profits, and taxes;
Average selling price (ASP) = CC + DC + GM
– Average discount (AD, 33% of ASP): volume
discounts by manufacturers;
List price = ASP + AD
Slide 11
Performances & Quantitative Principles
ExecutionTime _ Y
• “X is n times faster than Y”  ExecutionTime _ X  n
• Performance (throughput) is inversely proportional to
execution time: Performance _ X  n
Performance _ Y
• Definition of time:
– wall-clock time: response time or elapsed time;
– CPU time: the accumulated time during which CPU is
computing:
»user CPU time
»system CPU time
– An example from UNIX: 90.7u 12.9s 2:39 65%
»90.7u: user CPU time (seconds)
»12.9s: system CPU time
»2:39(159 sec): elapsed time
Slide 12
Performances & Quantitative Principles
• Workload Representations (in decreasing accuracy):
– Real applications: most accurate but inflexible
and poor portability
– Modified/scripted applications: scripts to
stimulate (or highlight) certain features and to
enhance portability
– Kernels: extracted from real programs, good for
isolating performance of individual features of a
machine
– Toy benchmarks: simple and run on almost all
computers, good for beginning programming
assignments
– Synthetic benchmarks: artificially created to
match an “average” execution profile, do not
reward optimizations of behaviors in real programs
but absent from benchmarks, and vice versa--thus
can be misleading
Slide 13
Performances & Quantitative Principles
• Benchmark Suites: collection of kernels, real and
benchmark programs, lessening the weakness of any one
benchmark by the presence of others.(fig. 1.11)
– Desktop Benchmark Suites:
» CPU-intensive benchmarks: SPEC (Standard Performance
Evaluation Corporation): SPEC89  SPEC92  SPEC95 
SPEC2000(11 int CINT & 14 fp CFP2000, fig. 1.12):
real programs modified for portability and
highlighting CPU
» Graphics-intensive benchmarks: SPECviewperf for
systems supporting the OpenGL graphics library,
SPECapc for applications with intensive use of
graphics
– Server Benchmark Suites:
» CPU-throughput benchmarks: SPEC CPU2000  SPECrate
» I/O-intensive benchmarks: SPECSFS for file server,
SPECWeb for web server
» Transaction-processing (TP) benchmarks: TPC
(Transaction Processing Council): TCP-A (85)  TCP-C
(complex query)  TCP-H (ad-hoc decision support)
TCP-R (business decision support)  TCP-W (weboriented)
Slide 14
– Embedded Benchmarks: EEMBC (“embassy suites”, fig. 1.13)
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21