Transcript 算 法 导 论
精研课程
课程基本信息
课程编号: 21190120
上课时间、地点:2012年 夏学期
周二(9、10)曹光彪 西101 、周四(3~4)曹光
彪 西101
上机:周日(11、12)软件学院机房
考试时间:6月11日(hard deadline)
考试形式:提交课程论文—综述
学时/学分:4-2/周 学时/ 2-1 学分
7/21/2015
2/74
Office & Homepage
Office:工商楼 215
My Homepage:
http://www.cs.zju.edu.cn/people/yedeshi/
Course Home:
7/21/2015
3/74
Examination
Grading Polices:
1)
Homework: 15%
2)
Quiz: 10% (5th Week, May 15)
3)
Presentation+ Poster session : 20% (Slides submit to TA),
Videos maybe taken
Poster: in Lab time
4)
Programming Project: 20% (2-3programms)
5)
Final Survey: 35%
7/21/2015
4/74
TA information
金淑贤: roles
Guide to group the students
Collect the homework, program, and survey
Review of homework, programs
Collect the scores of each
presentations/posters
7/21/2015
5/74
7/21/2015
6/74
What is the position of
algorithms in CS
1. Linguists: what shall we talk to the
machines?
2. Algorithms: what is a good method for
solving a problem fast on my computer
3. Architects: Can I build a better
computer?
4. Sculptors of Machine Intelligence: Can I
write a computer program that can find its
own solution.
7/21/2015
7/74
Algorithms in Computer Science
Hardware
Compilers, Programming languages
Algorithms
Networking, Distributed systems, Fault tolerance, Security
Machine learning, Statistics, Information retrieval, AI
Bioinformatics .......
7/21/2015
8/74
MIT Undergraduate Programs
7/21/2015
9/74
What is algorithm?
(Oxford Dict.)Algorithm:
A set of rules that must be followed when solving a
particular problem.
From Math world
A specific set of instructions for carrying out a
procedure or solving a problem, usually with the
requirement that the procedure terminate at some
point.
An algorithm is any well-defined computational
procedure that takes some value, or set of
values, as input and produces some value, or
set of values, as output.
7/21/2015
10/74
7/21/2015
11/74
What will CS be?
Computer Science:
Past, Present, and Future
Ed Lazowska (Washington)
Computer Science is the new Math
Christos H. Papadimitriou (Berkeley)
7/21/2015
12/74
Algorithm
Problem definition 问题
Objective
目标 (very important)
Evaluation
算法评价
Methods
方法
7/21/2015
13/74
Algorithm evaluation
Quality:
how far away from the optimal solution ?
Cost:
Running time
Space needed
Our goal is to design algorithm with high
quality, but in low cost
7/21/2015
14/74
Reasonable times
Poly(|I|), Time polynomial in |I|,
where |I| is the size of the
problem instance
Input size: size(x) of an instance x
with rational data is the total
number of bits needed for the
binary prepresentation.
7/21/2015
15/74
Time complexity
logarithmic time if T(n) = O(log n).
sub-linear time if T(n) = o(n)
linear time, or O(n) time
linearithmic function: T(n) = O(n log n), quasilinear
time if T(n) = O(n logk n)
polynomial time: T(n) = O(nk) for some constant k
strongly polynomial time:
the number of operations in the arithmetic model of computation is bounded
by a polynomial in the number of integers in the input instance; and
the space used by the algorithm is bounded by a polynomial in the size of
the input.
weakly polynomial time: P but not strongly P
7/21/2015
16/74
Time complexity
Quasi-polynomial time:
for some
fixed c.
Sub-exponential time if T(n) = 2o(n)
Exponential time, if T(n) is upper
bounded by 2poly(n)
7/21/2015
17/74
Hardness of problems
Easy
Polynomial (e.g. n2, n log n, n3, n1000).
Quasi-polynomial(e.g.:n log n, n log2n, c log7n).
Sub-exponential (e.g.:
2√n,
0.98
n
5(
)).
Exponential (e.g.: 2n, 8n, n!, nn).
Hard
7/21/2015
18/74
Running time
Computer A is 100 times faster than
computer B
Sort n numbers
Computer A requires
2n
2
instructions
Computer B requires 50nlgn instructions
n = 1,000, 000
Computer A: 2(10^6)^2/10^9 = 2000 seconds
Computer B: 50*10^6 lg 10^6/10^7 ~ 100 seconds
7/21/2015
19/74
Running time
n
10
100
1,000
<1s
<1s
<1s
10,000 < 1 s
106
7/21/2015
1s
nlog n
n2
n3
< 1s
<1s
<1s
<1s <1s
<1s 1s
< 1 s 2 min
20 s
12
days
1s
2n
<1s
18
min
18
Very
min
long
12
Very
day
long
31710 Very
year
long
n!
4s
1025
year
Very
long
Very
long
Very
long
20/74
Sorting
输入:A sequence of n number
< a1, a2 , …, an >
输出:排列(permutation )
< a01 , a02 , … , a0n >
a 01 <= a 02 <=
...
使得:
Example:
<= a 0n
Input: 8 2 4 9 3 6
Output: 2 3 4 6 8 9
7/21/2015
21/74
EX. of insertion sort
8
7/21/2015
2
4
9
3
6
22/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
23/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
24/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
2
4
8
9
3
6
25/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
2
4
8
9
3
6
26/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
2
4
8
9
3
6
2
4
8
9
3
6
27/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
2
4
8
9
3
6
2
4
8
9
3
6
28/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
2
4
8
9
3
6
2
4
8
9
3
6
2
3
4
8
9
6
29/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
2
4
8
9
3
6
2
4
8
9
3
6
2
3
4
8
9
6
30/74
EX. of insertion sort
7/21/2015
8
2
4
9
3
6
2
8
4
9
3
6
2
4
8
9
3
6
2
4
8
9
3
6
2
3
4
8
9
6
2
3
4
6
8
9 done
31/74
Insertion sort
“pseudocode”
1
i
INSERTION-SORT (A, n)
⊳ A[1 . . n]
for j ← 2 to n
do key ← A[ j]
i←j–1
while i > 0 and A[i] > key
do A[i+1] ← A[i]
i←i–1
A[i+1] = key
j
n
A:
sorted
7/21/2015
key
32/74
Analyzing algorithms
Need a computational model
Random-access machine (RAM) model
Instructions are executed one after another. No
concurrent operations.
Arithmetic: add, subtract, multiply, divide, remainder, floor,
ceiling
Data movement: load, store, copy
Control: conditional/unconditional branch, subroutine call and
return.
Each of these instructions takes a constant amount of
time.
7/21/2015
33/74
Running time
Running time:
The running time of an algorithm on a particular input is the
number of primitive operations or “steps” executed.
line consists only of primitive operations and takes constant time
Input size:
number of items
the total number of bits.
more than one number: Graph
the number of vertices and the
number of edges
7/21/2015
34/74
Example:
The input size of sorting problem is n.
Worst-case running time of Insert sort is
O(n2).
7/21/2015
35/74
Running time
The running time depends on the input: an
already sorted sequence is easier to sort.
Parameterize the running time by the size
of the input, since short sequences are
easier to sort than long ones.
Generally, we seek upper bounds on the
running time, because everybody likes a
guarantee.
7/21/2015
36/74
Map of Algorithm Design
New problem
Off-line problem
On-line problem
Polynomial
Polynomial
NP-C problem
Quality
Appro. ratio
Improve cost
running time
7/21/2015
Exact
Algorithm
Improve cost
running time
Approximate
Algorithm
Heuristic
Quality
Appro. ratio
37/74
课程内容
1. 数学基础
1.1 算法基础
1.2 和 (SUMS) 集合运算 (Sets)
1.3 特殊数 (Stirling numbers, Harmonic numbers,
Eulerian numbers et al.)
2. 基本算法
2.1 分治 (Divide-and-Conquer)*
2.1.1 Mergesort *
2.1.2 自然数相乘(Multiplication)*
2.1.3 矩阵相乘(Matrix multiplication)
2.1.4 Discrete Fourier transform and Fast Fourier
transform
7/21/2015
38/74
课程内容
2.2 动态规划 (Dynamic Programming)
2.2.1 背包问题(Knapsack problem)
2.2.2 最长递增子序列(Longest increasing
subsequence)
2.2.3 Sequence alignment
2.2.4 最长相同子序列(Longest common
subsequence)
2.3.5 Matrix-chain multiplication
2.3.6 树上的独立集 (Max Independent set in
tree)
7/21/2015
39/74
课程内容
2.3 贪婪算法 (Greedy)
2.3.1 区间规划(Interval scheduling)
2.3.2 集合覆盖(Set cover)
2.3.3 拟阵(Matroids)
2.4 NP 问题 (NP-completeness)
2.4.1 The classes P and NP
2.4.2 NP-completeness and reducibility
2.4.3 NP-complete problems *
7/21/2015
40/74
课程内容
2.5 近似算法 (Approximate Algorithm)
2.5.1 顶点覆盖问题 (Vertex cover)
2.5.2 负载平衡问题 (Load balancing)
2.5.3 旅行商问题 (Traveling salesman
problem)
2.5.4 子集和问题 (Subset sum problem)
7/21/2015
41/74
课程内容
3. 算法的应用
3.1 局部搜索 (Local Search)
3.1.1 The Metropolis Algorithm and Simulated
Annealing
3.1.2 Local Search to Hopfield Neural
Networks(Nash Equilibria)
3.1.3 Maximum Cut Approximation via Local
Search
7/21/2015
42/74
课程内容
3.2 图论 (Graph Theorem) *
3.2.1 图论的基本知识 (Fundamental)
3.2.2 线性规划 (Linear Programming)
网络流(Network Flow),二分图,完全图的匹配
3.3计算几何学 (Computational Geometry)*
3.3.1 基本概念与折线段的性质 (Line-segment )
3.3.2 线段的一些性质 (Segments intersects )
3.3.3 凸包问题 (Convex Hull )
3.3.4 最近点对问题 (The closet pair of points)
3.3.5 多边形三角剖分 (Polygon Triangulation)
7/21/2015
43/74
课程内容
3.4 随机算法 (Randomized Algorithm)
3.4.1 随机变量与期望
3.4.2 A Randomized MAX-3-SAT
3.4.3 Randomized Divide-and-Conquer
3.5 在线算法(Online Algorithm)
3.5.1 Online Skying
3.5.2 Online Hiring
7/21/2015
*:备选内容
44/74
课程内容
7/21/2015
45/74
教材
Textbook: Introduction to algorithms,
Second Edition. Thomas H. Cormen, Charles E.
Leiserson, Ronald L. Rivest and Clifford Stein. The MIT
Press, 2001. ISBN: 0262032937.
Recommended: Algorithm Design. Jon
Kleinberg, Éva Tardos. Addison Wesley, 2005. ISBN: 0321-29535-8.
Rolf Nevanlinna
Prize, 06
7/21/2015
46/74
7/21/2015
47/74
参考教材
Algorithms. S. Dasgupta, C.H. Papadimitriou,
and U. V. Vazirani. May 2006.
Combinatorial Algorithms. Jeff Erickson.
University of Illinois, Urbana-Champaign. Lecture
Notes. Fall 2002.
Concrete Mathematics. Ronald L. Graham,
Donald E. Knuth, Oren Patashnik. AddisonWesley Publishing Company, 2005. ISBN: o201-14236-8.
7/21/2015
48/74
Algorithms in Computer Science
P = NP ?
Can we solve a problem efficiently?
Tradeoff between quality of solution and
the running time
Solve a problem with optimal solution, but it
might cost long time
Solve a problem approximately in short time
7/21/2015
49/74
$1,000,000 problem
P = NP ?
http://www.claymath.org/millennium/
Solved???!!!!
7/21/2015
50/74
Algorithms in Computer Science
Selfish Routing
Privacy preserve in database
TSP
Ad auction
7/21/2015
51/74
Perspective
Algorithms we can find everywhere
They have been developed to easy our
daily life
Train/Airplane timetable schedule
Routing
We live in the age of information
Text, numbers, images, video, audio
7/21/2015
52/74
Selfish routing
Pigou's Example
Suburb s, a nearby train station t.
Assuming that all drivers aim to minimize
the driving time from s to t
C(x) = 1
s
t
C(x) = x, with x in [0, 1]
7/21/2015
53/74
Selfish routing
We have good reason to expect all traffic
to follow the lower road
Social optimal? ½ to the long, wide
highway, ½ to the lower road.
selfish behavior need not produce a socially
optimal outcome
7/21/2015
54/74
Braess's Paradox
v
C(x) = x
C(x) = 1
s
t
C(x) = 1
C(x) = x
w
7/21/2015
55/74
Braess's Paradox
v
C(x) = x
C(x) = 1
C(x) = 0
s
C(x) = 1
t
C(x) = x
w
7/21/2015
56/74
Braess's Paradox
Paradox thus shows that the intuitively
helpful action of adding a new zero-cost link
can negatively impact all of the traffic!
With selfish routing, network improvements
can degrade network performance.
7/21/2015
57/74
Link attack example
Re-identify the medical record of the governor of
Massachussetts
MA collects and publishes sanitized medical data for state
employees (microdata) left circle
voter registration list of MA (publicly available data) right circle
• looking for governor’s record
• join the tables:
– 6 people had his birth date
– 3 were men
– 1 in his zipcode
• regarding the US 1990 census data
– 87% of the population are unique based on (zipcode, gender,
dob)
Privacy in microdata
the role of attributes in microdata
explicit identifiers are removed
quasi identifiers can be used to re-identify individuals
sensitive attributes (may not exist!) carry sensitive information
identifier Birthdatequasi
Name
Sexidentifiers
Zipcode
sensitive
Disease
Name
Andre
Birthdate
21/1/79
Sex
male
Zipcode
53715
Disease
Flu
Andre
Beth
21/1/79
10/1/81
male
female
53715
55410
Flu
Hepatitis
Beth
Carol
10/1/81
1/10/44
female
55410
90210
Hepatitis
Brochitis
Carol
Dan
1/10/44
21/2/84
female
male
90210
02174
Dan
Ellen
21/2/84
19/4/72
male
female
02174
02237
Ellen
19/4/72
female
02237
Brochitis
Sprained
Ankle
Sprained
AIDS
Ankle
AIDS
k-anonymity
k-anonymity: intuitively, hide each individual among k-1 others
each QI set of values should appear at least k times in the released
microdata
linking cannot be performed with confidence > 1/k
sensitive attributes are not considered (going to revisit this...)
how to achieve this?
generalization and suppression
value perturbation is not considered (we should remain truthful to
original values )
privacy vs utility tradeoff
do not anonymize more than necessary
Advertisement Auction
Auction
Dutch auction
Vickrey auction
Ad placement
7/21/2015
61/74
k-anonymity example
tools for anonymization
generalization
publish more general values, i.e., given a domain hierarchy, roll-up
suppression
remove tuples, i.e., do not publish outliers
often the number of suppressed tuples is bounded
original microdata
2-anonymous data
Birthdate
Sex
Zipcode
Birthdate Sex
Zipcode
21/1/79
male
53715
*/1/79
person
5****
10/1/79
female
55410
*/1/79
person
5****
1/10/44
female
90210
female
90210
21/2/83
male
02274
*/*/8*
male
022**
19/4/82
male
02237
*/*/8*
male
022**
group 1
suppressed 1/10/44
group 2
TSP
Trucking company with a central warehouse
Each day, it loads up the truck at the warehouse
and sends it around to several locations to make
deliveries.
At the end of the day, the truck must end up
back at the warehouse so that it ready to be
loaded for the next day.
To reduce the costs, the company wants to
select an order of delivery stops that yields the
lowest overall distance traveled by the truck.
7/21/2015
63/74
7/21/2015
64/74
7/21/2015
65/74
7/21/2015
66/74
Pizza delivery
One can give a call or via internet to order
a pizza for dinner
We want the hot, fresh and tasty pizzas
How should they delivery the pizzas upon
the reception of orders??
Immediately or wait some minutes for
next orders in the near places?
7/21/2015
67/74
The Ski problem
The Ski problem [Karp 92]: A skier must
decide every day she goes skiing whether
to rent or buy skis, unless or until she
decides to buy them. The skiier doesn’t
know how many days she will go on skiing
before she gets tired of this hobbie. The
cost to rent skis for a day is 1 unit, while
the cost to buy the skis is B units.
How can she save money?
7/21/2015
68/74
Lost cow problem
A short-sighted cow (or assume it’s dark,
or foggy, or ...) is standing in front of a
fence and does not know in which
direction the only gate in the fence might
be. How can the cow find the gate without
walking too great a detour?
How can two soldiers get together when
lost in battlefield ?
7/21/2015
69/74
Erdős project – shortest path
Paul Erdős(1913-1996) has an Erdős
number of zero. If the lowest Erdős number
of a coauthor is X, then the author's Erdős
number is X + 1.
7/21/2015
70/74
Nevanlinna Prize winners
NAME
YEAR
Robert Tarjan
1982
Leslie Valiant
1986
Alexander Razborov 1990
Avi Wigderson
1994
Peter Shor
1998
Madhu Sudan
2002
Jon Kleinberg
2006
Daniel Alan Spielman 2010
7/21/2015
COUNTRY ERDÖS NUMBER
USA
2
Hungary/Gt Brtn 3
Russia
2
Israel
2
USA
2
India/USA
2
USA
3
USA
71/74
Other famous people
Albert Einstein
1921 Physics
2
Chen Ning Yang
1957 Physics
4
Tsung-dao Lee
1957 Physics
5
John F. Nash
1994 Economics 4
Edmund S. Phelps 2006 Economics 4
Shing-Tung Yau
1982 China
2
Shiing Shen Chern 1983-84 China
2
Alan Turing
computer science
5
John von Neumann mathematics
3
David Hilbert
mathematics
4
Donald E. Knuth
2
7/21/2015
72/74
Extensions of shortest path
On k-skip Shortest Paths (SIGMOD 2011)
7/21/2015
73/74
History of Algorithm
The word algorithm comes from the name of the
9th century Persian mathematician Abu Abdullah
Muhammad ibn Musa al-Khwarizmi whose
works introduced Arabic numerals and algebraic
concepts.
The word algorism originally referred only to the
rules of performing arithmetic using Arabic
numerals but evolved into algorithm by the 18th
century. The word has now evolved to include all
definite procedures for solving problems or
performing tasks.
7/21/2015
74/74
History – con.
The first case of an algorithm written for a computer was
Ada Byron's notes on the analytical engine written in
1842, for which she is considered by many to be the
world's first programmer. However, since Charles
Babbage never completed his analytical engine the
algorithm was never implemented on it.
This problem was largely solved with the description of
the Turing machine, an abstract model of a computer
formulated by Alan Turing, and the demonstration that
every method yet found for describing "well-defined
procedures" advanced by other mathematicians could be
emulated on a Turing machine (a statement known as
the Church-Turing thesis).
7/21/2015
75/74
Why you come here?
7/21/2015
76/74
Requirement
Come to the class (*)
Ask questions
Thinking:
Why it is ok now?
How about other methods?
7/21/2015
77/74
Kinds of analyses
Worst-case: (usually)
T(n) = maximum time of algorithm on any input of size
n.
Average-case: (sometimes)
T(n) = expected time of algorithm over all inputs of
size n.
Need assumption of statistical distribution of inputs.
Best-case: (bogus)
Cheat with a slow algorithm that works fast on
some input.
7/21/2015
78/74
Uniform distribution
7/21/2015
79/74
Performance Measures for On-line Algorithms
Competitive ratio
Max/Max ratio
Smoothed Competitiveness
7/21/2015
80/74