Transcript Slide

네트워크의 과학,
복잡한세상을 바라보는 새로운 시각
정하웅
KAIST, 물리학과
Albert László Barabási, Réka Albert
(Univ. of Notre Dame)
Zoltán N. Oltvai (Northwestern Univ.)
www.nd.edu/~networks
Austin Powers:
The spy who
shagged me
Let’s make
it legal
Robert Wagner
Wild Things
What Price Glory
Barry Norton
A Few
Good Man
Monsieur
Verdoux
What is Complexity?
A popular paradigm: Simple systems display complex behavior
 non-linear systems
 chaos
 fractals
3 Body Problem
Earth( ) Jupiter ( ) Sun (
)
Main Entry: 1com·plex
Function: noun
Etymology: Late Latin complexus totality, from Latin,
embrace, from complecti
Date: 1643
1 : a whole made up of complicated or interrelated parts
Society
Internet
Human body :
NETWORKS!
chemical network
Society
Nodes: individuals
Links: social relationship
(family/work/friendship/etc.)
S. Milgram (1967)
John Guare
“Six Degrees of Separation”
Social networks: Many individuals with diverse
social interactions between them.
9-11 Terror Hijacker’s Network
Communication networks
The Earth is developing an electronic nervous system,
a network with diverse nodes and links are
-computers
-phone lines
-routers
-TV cables
-satellites
-EM waves
Communication
networks: Many
non-identical
components
with diverse
connections
between them.
GENOME
protein-gene
interactions
PROTEOME
protein-protein
interactions
METABOLISM
Bio-chemical
reactions
Citrate Cycle
Complex systems
Made of
many non-identical elements
connected by diverse interactions.
NETWORK
Erdös-Rényi model
(1960)
Connect with
probability p
p=1/6
N=10
k ~ 1.5
Pál Erdös
(1913-1996)
Degree Distribution
P(k) : prob. that a certain
node will have k links
Poisson distribution
- Random
- Democratic
ARE COMPLEX NETWORKS
REALLY RANDOM?
To test this:
We need to pragmatically investigate
the topology of large real networks.
World Wide Web
Nodes: WWW documents
Links: URL links
800 million documents
(S. Lawrence, Nature,1999)
ROBOT:
collects all
URL’s found in a
document and follows
them recursively
R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999)
What did we expect?
k ~ 6
P(k=500) ~ 10-99
NWWW ~ 109
 N(k=500)~10-90
We find:
out= 2.45
 in = 2.1
P(k=500) ~ 10-6
NWWW ~ 109
 N(k=500) ~ 103
Pout(k) ~ k-out
Pin(k) ~ k- in
What does it mean?
Poisson distribution
Exponential Network
Power-law distribution
Scale-free Network
INTERNET BACKBONE
Nodes: computers, routers
Links: physical lines
(Faloutsos, Faloutsos and Faloutsos, 1999)
SEX-Web
Nodes: people (females; males)
Links: sexual relationships
4781 Swedes; 18-74;
59% response rate.
(Liljeros et al. Nature 2001)
ACTOR CONNECTIVITIES
Nodes: actors
Links: cast jointly
Days of Thunder (1990)
Far and Away
(1992)
Eyes Wide Shut (1999)
N = 212,250 actors
k = 28.78
P(k) ~k-
=2.3
SCIENCE CITATION INDEX
Nodes: papers
Links: citations
1736 PRL papers (1988)
P(k) ~k-
( = 3)
(S. Redner, 1998)
Econo network
Nodes: individual, company, country...
Links: economic activities
Pi(t) : stock price at time t , Yi  ln Pi (t )  ln Pi (t  1)
GENOME
protein-gene
interactions
PROTEOME
protein-protein
interactions
METABOLISM
Bio-chemical
reactions
Citrate Cycle
protein-protein
network
(yeast)
Jeong et al.
Nature 411, 41 (2001)
p53 network
(mammals)
metabolic
network
(E. coli)
Jeong et al.
Nature 407, 651 (2000).
Other Examples of Scale-Free Network
Email network
Nodes: individual email address
Links: email communication
Phone-call networks
Nodes: phone-number
Links: completed phone call
(Abello et al, 1999)
Networks in linguistics
Nodes: words
Links: appear next or one word apart from each other
(Ferrer et al, 2001)
Networks in Electronic auction (eBay)
Nodes: agents, individuals
Links: bids for the same item
(H. Jeong et al, 2001)
Most real world networks have
the same internal structure:
Scale-free networks
How?
Why?
ORIGIN OF SCALE-FREE NETWORKS
(1) The number of nodes (N) is NOT fixed.
Networks continuously expand by
the addition of new nodes
Examples:
WWW : addition of new documents
Citation : publication of new papers
(2) The attachment is NOT uniform.
A node is linked with higher probability to a node
that already has a large number of links.
Examples :
WWW : new documents link to well known sites
(CNN, YAHOO, NewYork Times, etc)
Citation : well cited papers are more likely to be cited again
(1) GROWTH :
Scale-Free Model
At every timestep we add a new node with m edges
(connected to the nodes already present in the system).
(2) PREFERENTIAL ATTACHMENT :
The probability Π that a new node will be connected to
node i depends on the connectivity ki of that node
ki
 ( ki ) 
 jk j
P(k) ~k-3
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
Continuum Theory
ki
ki
ki
  ( ki )  A

, with initial condition ki (ti )  m
t
k
2
t
j j
t
ki (t )  m
ti
m 2t
m 2t
m 2t
P(ki (t )  k )  Pt (ti  2 )  1  Pt (ti  2 )  1  2
k
k
k (m0  t )
P(ki (t )  k ) 2m 2t 1
3
 P(k ) 

~
k
k
mo  t k 3
γ=3
A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)
Most real world networks have
the same internal structure:
Scale-free networks
How?
Why?
Efficiency of resource usage
With same number of nodes and links (same amount of resources),
construct scale-free and exponential networks.
Diameter (Scale-free) < Diameter (Exponential)
(Diameter : average distance between two nodes)
Scale-free network is more efficient than exponential network!
Robustness
Complex systems maintain their basic functions
even under errors and failures
(cell  mutations; Internet  router breakdowns)
1
Relative size of
largest cluster
S
fc
0
1
Fraction of removed nodes, f
node failure
Robustness of scale-free networks
Extreme
failure
tolerance
Failures
Topological
error tolerance
1
  3 : fc=1
S
0
Attacks
(R. Cohen et al PRL, 2000)
fc
f
fc
1
Low
survivability
under
attacks!
Achilles’ Heel of complex network
failure
attack
Internet
Protein network
R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000)
Bio-informatics vs. Networks
Human Genome Project completed!
 Inventory of all genes
 Only list of proteins
Post Genome Era
• Transcriptomics
• Proteomics
 Needs information on interactions
“Human Network Project”
GENOME
protein-gene
interactions
PROTEOME
protein-protein
interactions
METABOLISM
Bio-chemical
reactions
Citrate Cycle
METABOLISM
Bio-chemical
reactions
Citrate Cycle
Nodes: chemicals (proteins,
Metabolic Networks
substrates)
Links: bio-chem. reaction
Metabolic networks
Archaea
Bacteria
Eukaryotes
Organisms
from all three
domains of life
are scale-free
networks!
H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)
Properties of metabolic networks
Average distances are independent of organisms!
 by making more links between nodes.
 based on “design principles” of the cell through evolution.
cf. Other scale-free network: D~log(N)
GENOME
protein-gene
interactions
PROTEOME
protein-protein
interactions
METABOLISM
Bio-chemical
reactions
Citrate Cycle
PROTEOME
protein-protein
interactions
Yeast protein network
Nodes: proteins
Links: physical interactions (binding)
P. Uetz, et al. Nature 403, 623-7 (2000).
Topology of the protein network
k  k0
P (k ) ~ (k  k0 ) exp( 
)
k

Power-law with exponential cut-off :
(physical limitation)
H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 41 (2001)
[3280 protein with 4434 interactions]
P(k) ~ k-
Uetz  2.4
Ito  2.3
www.nd.edu/~networks/cell
While there is only 13% overlap between the Uetz et al
and Ito et al data, their large-scale topology is identical.
Ito et al, PNAS 97, 1143 (2000); PNAS 98, 4569 (2001).
Yeast protein network
- lethality and topological position -
Highly connected proteins are more essential (lethal)
than less connected proteins.
GENOME
protein-gene
interactions
PROTEOME
protein-protein
interactions
METABOLISM
Bio-chemical
reactions
Citrate Cycle
protein-gene
interactions
Nature 408 307 (2000)
“… since 1989 … there have been
over 17,000 publications centered on
p53 … this work has led to
considerable confusion and
controversy.”
…
“One way to understand the p53
network is to compare it to the Internet.
The cell, like the Internet, appears to
be a ‘scale-free network’.”
p53 network (mammals)
Complexity
Network
Science collaboration
WWW
Food Web
Scale-free network
Citation pattern
Internet
Cell
UNCOVERING ORDER HIDDEN WITHIN COMPLEX SYSTEMS
Conclusions
• Scale-free networks are ubiquitous
• Effective use of resources
• Robust against errors and random failures
• Vulnerable under attack
• New way of understanding complexity
• Applicable to many areas
Outlook
• Directed Networks
• Weighted Networks
• Dynamics on networks
Austin Powers:
The spy who
shagged me
Let’s make
it legal
Robert Wagner
Wild Things
What Price Glory
Barry Norton
A Few
Good Man
Monsieur
Verdoux
Bonus: Why Kevin Bacon?
Kevin Bacon
No. of movies : 46
No. of actors : 1811
Average separation: 2.79
Measure the average distance between Kevin Bacon and all other actors.
Is Kevin Bacon
the most
connected actor?
NO!
Rod Steiger
Donald Pleasence
Martin Sheen
Christopher Lee
Robert Mitchum
Charlton Heston
Eddie Albert
Robert Vaughn
Donald Sutherland
John Gielgud
Anthony Quinn
James Earl Jones
Average
distance
2.537527
2.542376
2.551210
2.552497
2.557181
2.566284
2.567036
2.570193
2.577880
2.578980
2.579750
2.584440
# of
movies
112
180
136
201
136
104
112
126
107
122
146
112
# of
links
2562
2874
3501
2993
2905
2552
3333
2761
2865
2942
2978
3787
Kevin Bacon
Kevin
Bacon
2.786981
2.786981
46
46
1811
1811
Rank
Name
1
2
3
4
5
6
7
8
9
10
11
12
…
876
876
…
#1 Rod Steiger
#876
Kevin Bacon
Donald
#2
Pleasence
#3 Martin Sheen
References
• R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999).
• R. Albert, A.L. Barabasi, Science 286 509 (1999).
• A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)
• R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000).
• H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000).
• H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001).
• J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics
(2001).
• S.H. Yook, H. Jeong, A.-L. Barabasi, PNAS (2002)
URL: http://stat.kaist.ac.kr
Email: Hawoong Jeong
[email protected]
Paul Erdös has a Bacon number of 4!
Paul Erdös was in N is a number (1993) with Gene Patterson
Gene Patterson was in Box of Moonlight (1996) with John Turturro
John Turturro was in Gung Ho (1986) with Clint Howard
Clint Howard was in My Dog Skip (2000) with Kevin Bacon
Information Tech. vs. Networks
Relative Coverage of Search Engines
(83% commercial content excluded
Feb. 1999)
 NON-EQUAL ACCESS OF INFORMATION
Based on network structure, such as
number of incoming links & popularity,
find better information!
Designing better Internet!
Spatial Distribution of Routers
Fractal set
Box counting: N(l)  No. of boxes
of size l that contain routers
N(l) ~ l -Df
Df=1.5
인터넷의 미래
The Grid
Distributed, high performance(parallel)
computing and data handling infrastructure
NT
ET
대형연구
장비
BT
연구망
(Gbps)
Supercomputer
첨단대형
연구장비
원격제어
및 감시
가상실험
시스템
N*Grid
선박
CT
첨단연구장비
학제간공동
연구
기계
고성능
클러스터
대규모
데이터
(BT, NT, etc.)
반도체
데이터 그리드와 컴퓨터
그리드를 이용한 대규모
분산컴퓨팅 가능
ST
자동차
기상
(PC 등 컴퓨터의 유휴자원을
이용, 슈퍼컴퓨터의 수천-수만배
이상의 계산능력을 가짐.)
분산컴퓨터의 미래
분산컴퓨터 (Distributed Computer)
사용이 허가되어진 컴퓨터들에게 커다란 작업을
분할하여 계산 후 결과를 수집, 분석하는 방법.
단점: 컴퓨터 소유주가 자원사용을 허용해야함
(참여율 저조)
좀더 적극적인 접근 방법
스텔스-컴퓨터 (Stealth Computer)
분산 컴퓨터가 소유자의 허가를 받아야하는 것과 달리
소유자의 허가와 상관없이 네트워크에 연결된 모든
컴퓨터에 기본적으로 제공되는 표준 통신프로토콜을
이용, 모든 컴퓨터에 분산컴퓨팅 기법을 적용하는 방법.
문제점: 인터넷에 연결된 기본자원의 소유개념 등
여러가지 새로운 윤리적인 문제 발생.
HTTP 프로토콜과 TCP/IP checksum을
이용한 스텔스 컴퓨팅의 한 예.
Can we stop AIDS?
Because of scale-free structure of the network
it is hard to stop AIDS with standard methods.
(There is no threshold.)
By treating hubs (or at
least, hub-like nodes)
with higher probability,
we can restore finite
threshold.
Food Web
Nodes: trophic species
Links: trophic interactions
R.J. Williams, N.D. Martinez Nature (2000)
SCIENCE COAUTHORSHIP
Nodes: scientist (authors)
Links: write paper together
(Newman, 2000, H. Jeong et al 2001)
Integrated drug target identification
based on genomic data
Cy5
Cy3
Taxonomy using networks
Based on
Topology
of
metabolic
networks
A: Archaea
B: Bacteria
E: Eukaryotes

Taxonomy
J. Podani et al, Nature Genetics (2001)
Can we stop AIDS?
Because of scale-free structure of the network
it is hard to stop AIDS with standard methods.
(There is no threshold.)
By treating hubs (or at
least, hub-like nodes)
with higher probability,
we can restore finite
threshold.
Modeling the Internet’s
large scale topology
S.H. Yook, H. Jeong, A.-L. Barabasi
[cond-mat/0107417]
INTERNET BACKBONE
Nodes: computers, routers
Links: physical lines
(Faloutsos, Faloutsos and Faloutsos, 1999)
Preferential Attachment
• Compare maps taken at different times (Dt = 6 months)
• Measure Dk(k), increase in No. of links for a node
with k links
Preferential Attachment:
Dk(k) ~ k
Spatial Distributions
Router
density
Population
density
Spatial Distribution of Routers
Fractal set
Box counting: N(l)  No. of boxes
of size l that contain routers
N(l) ~ l -Df
Df=1.5
Distance dep.
INTERNET
 (i, j ) ~
kj
a
d ij
s
# of links
distance between
two nodes
N(l) ~ l-Df
Df=1.5
Dk(k) ~ ka
a=1
P(d) ~ d-s
s=1
Good agreement with real Internet Data!
Traditional modeling:
Network as a static graph
Given a network with N nodes and L links

Create a graph with statistically identical topology
RESULT: model the static network topology
PROBLEM: real networks are not static!
• new nodes
Evolve through
:
local processes
• new links between existing nodes
• nodes removed
• links removed
 Real networks are dynamical systems!
Erdös-Rényi model
(1960)
Connect with
probability p
p=1/6
N=10
k ~ 1.5
- Random
- Democratic
Pál Erdös
(1913-1996)
Poisson distribution
Watts-Strogatz
Clustering: My friends will know each other with high probability!
Probability to be connected C
»p
# of links between 1,2,…n neighbors
C=
n(n-1)/2
N nodes forms a regular lattice.
With probability p,
each edge is rewired randomly.
(Nature 393, 440 (1998))
Small-World Phenomena
C(p) : clustering coeff.
L(p) : average path length
Networks are clustered
[large C(p)]
but have a small
characteristic path length
[small L(p)].
Evolving networks
OBJECTIVE: capture the network dynamics
• identify the processes that contribute to the network topology
METHOD : • measure their frequency from real data
• develop dynamical models that capture these processes

BONUS: get the topology correctly.
(1) GROWTH :
Scale-free model
At every timestep we add a new node with m edges
(connected to the nodes already present in the system).
(2) PREFERENTIAL ATTACHMENT :
The probability Π that a new node will be connected to
node i depends on the connectivity ki of that node
ki
 ( ki ) 
 jk j
P(k) ~k-3
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
Model - A
(Without Growth)
Model - B
(Without Preferential Attachment)
For given fixed number of
nodes N,
Number of nodes are
increasing,
links are added with
preferential attachment.
but links are added with
random probability.
Temporarily shows
scale-free behavior, but
ends up to
exponential network
Poisson distribution
(exponential network)
Preferential Attachment
ki
Dk i
  ( ki ) ~
t
Dt
For given Dt,Dk  (k)
k vs. Dk : increment of # of links during unit time
Citation
network
Internet
Universality?
WWW
(in)
 = 2.1
WWW
(out)
 = 2.45
Internet
(domain)
Internet
(router)
Actor
Citation
index
Powergrid
 = 2.2
 = 2.5
 = 2.3
 = 3
 = 4
Extended Model
• prob. p : internal links
• prob. q : link deletion
• prob. 1-p-q : add node
P(k) ~ (k+(p,q,m))-(p,q,m)
  [1,)
p=0.937
m=1
 = 31.68
 = 3.07
Actor
network
• Predict the network topology
from microscopic processes
with parameters (p,q,m)
• Scaling but no universality
Other Models
• Model with non-linear preferential attachment :
(k) ~ ka  P(k) ~ no scaling for a1
 a<1 : stretch-exponential
 a>1 : no-scaling (a>2 : “winner-takes-all”)
(Krapivsky et al (2000).)
• Model with initial attractiveness : (k) ~ A+ka
 P(k) ~ k- where =2 + A/m
(Dorogovtsev et al (2000).)
• Model with aging : each node has lifetime
 node cannot get links after finite time. (actor)
 P(k) : power-law with exponential cutoff
(Amaral et al (2000).)
Other Models (continued)
• Model with limitation : each node has capacity limit.
 node cannot get links after finite # of links.(internet)
P(k) : power-law with exponential cutoff
(Amaral et al (2000).)
• Model with fitness (i) : each node has an intrinsic ability
to get links at the expense of other nodes : (ki) ~ iki
C
C m  1
 P (k )    ( ) ( )
 k
(Bianconi et al (2000).)
Other Models (continued)
•
Alternative mechanisms for preferential attachment:
1. Copying mechanism: (concept from WWW)
: after adding new node, it is connected to
with prob. p: randomly selected node
with prob. 1-p: copied target from selected ‘prototye node’
2. Attaching to edges:
: after adding new node, instead of selecting target node,
select target edges and connect new node to end point of
the edges.
Whole cellular network
References
• R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999).
• R. Albert, A.L. Barabasi, Science 286 509 (1999).
• A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)
• R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000).
• H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000).
• H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001).
• J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics
(2001).
URL: http://www.nd.edu/~networks
Diameter of WWW: 19 degrees of separation
3
l15=2 [125]
6
1
l17=4 [1346 7]
4
5
2
7
… < l > = ??
 Finite size scaling: create a network with N nodes with Pin(k) and Pout(k)
< l > = 0.35 + 2.06 log(N)
19 degrees of separation
R. Albert et al Nature (99)
nd.edu
<l>
based on 800 million webpages
[S. Lawrence et al Nature (99)]
IBM
A. Broder et al WWW9 (00)
Parameter (s,a,Df) dependence of P(k) and P(d)
Robustness of scale-free networks
attack
Internet
failure
N~6209
<k> ~ 3.93
Fraction of nodes eliminated
attack
WWW
failure
Fraction of nodes eliminated
N~325,729
<k> ~ 4.6
Taxonomy by using Cellular networks
UPGMA (unweighted pair
grouping method with
arithmetic means)
Three domains of Life
dij = Jacard index
or rank order
A
1
d A3
2
3
d13  d 23

2