PPT - Eurecom

Download Report

Transcript PPT - Eurecom

Network Modeling (NetMod): Mon. 1:30-4:45pm
Instructor: Thrasyvoulos Spyropoulos
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
A Few Words About Your Teacher
1995-2000: Undergraduate studies in Greece
National Technical University of Athens (NTUA)
 Specialization: Telecommunications and Networking

2000-2006: MSc and PhD in Los Angeles, California
University of Southern California (USC)
 Thesis: Perf. Analysis and Protocols for Wireless Networks

2006-2007: INRIA, Sophia-Antipolis (next door)

Post-doc at PLANETE group (now DIANA)
2007-2010: ETH, Zurich

Senior Researcher/Lecturer
2010-present: EURECOM

Thrasyvoulos Spyropoulos / [email protected]
Professor
Eurecom, Sophia-Antipolis
2
A Few Words About the Class
Goal: to teach some mathematical tools that are valuable, when
trying to understand real networks…in fact real systems!
STOCHASTIC PROCESSES
NETWORK SCIENCE
Learn to deal with Randomness
Modeling Large Networks
Cloud Computing
APPLICATIONS
Web Server Farms
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
4G/5G Networks
Web Server Farms
3
Why Randomness?
 Many things in nature are random => same for networks

more precisely: increased complexity  described as randomness
 Randomness in:








Propagation phenomena (coding, diversity)
Location and mobility of nodes (handoff)
Traffic/Service arrival patterns (cellular capacity allocation)
Next link to be clicked on a webpage (browser prefetch)
Size of files downloaded (cache sizing)
Computing job arrivals and duration (cloud computing)
Number of (facebook) friends per user (advertising)
…
Computer Science Approach
• Design algorithm
• Deal with worst case
Thrasyvoulos Spyropoulos / [email protected]
Electrical Engineering Approach
• Optimize for probable cases
• Ignore rare events
Eurecom, Sophia-Antipolis
4
Why Large Networks?
Network of Internet Routers Online Social Nets (FaceBook)
Mesh Networks
 Most networks can be modeled
as a large graph
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
5
Need a Science of (Social/Large/Complex) Networks
 Difficult to study/model a specific graph
 Specific graph: an instance of a random graph
with specific qualitative properties
 “Complex/Social Network Analysis or Network
Science”  the study of qualitative properties
of large graphs/networks



Degree distribution, diameter, connectivity, clusters
(a) WHY do these properties arise? (Scientist)
(b) HOW can they be exploited? (Engineer)
 Degree distribution  searching, security
 Clustering  advertising, information
spreading
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
6
Where Do They All Come Together
 Many interesting problems on Networks can be modeled as a
Random Process on a (Random) Graph
Searching
Resource Allocation in 4G/5G
Malware Spread
 A timely course! Few courses around on these topics


Performance Analysis classes from CalTech and Carnegie Mellon Univ
Complex/Social Networks classes from CalTech and Cornell University
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
7
Course Textbooks - Reading Material
 PART I: Stochastic Processes and Queueing

“Performance Modeling and Design of Computer Systems”
by Mor-Harchol-Balter – shared copies in library – online:
http://proquestcombo.safaribooksonline.com/book/electricalengineering/computer-engineering/9781139610834


“Stochastic Processes” by S. Ross – library
“Introduction to Probability Models” by S. Ross – library
 PART II: Complex Network Analysis


“Networks, Crowds, and Markets: Reasoning About a
Highly Connected World” by D. Easley and T. Kleinberg –
pdf freely available online
“Networks: An Introduction” by M. Newman – shared copy
in library
 Additional reference material (tutorials, articles)
per topic
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
8
Course Evaluation (Grading)
 Regular Homeworks and 2 Labs --- 20% of Grade
 Midterm Exam (after Part I) --- 30% of Grade
 Final Exam --- 50% of Grade
 Participation --- extra credit!
 Office Hours: TBD
 Class Web Site:
http://www.eurecom.fr/~spyropou/netmod2016.htm
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
9
Course Expectations
What to expect from me:
 To make the class entertaining


Many examples and application
Interaction Interaction Interaction!
 To teach you key insights

Not just “tools”  when to use which tool 
why it works
What I expect from you:
 To study the assigned book chapters!
 To work hard on your homeworks
 Interaction Interaction Interaction!!!
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
10
Course Prerequisites
 Introductory Probability Theory





Distributions (bernoulli, geometric, binomial, gaussian, poisson)
Expectations, Variance, etc.
Conditional probabilities and expectations
Independence and Correlation
Review Reference: “Introduction to Prob. Models” or “A First
Course in Probability” by Sheldon Ross – available in library
 (very!) Elementary Linear Algebra





Matrix multiplication
Solving Linear Systems
Eigenvalues
Check out Gilbert Strang’s online lectures for a refresher
(excellent!)
A tiny bit of MatLab
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
11
Probability Refresher: Playing the Odds
 Betting on the Roulette



18 red
18 black
2 green
 John observes the roulette and counts
5 reds in a row
Q: In the next roll should he bet on red
or black?
Q: What if John sees 20 reds in a row??
On August 18, 1913, at the casino in Monte Carlo, black came up a record

John: “Itimes
should
bet on black!
21 reds
in awas
row
are VERY unlikely!
twenty-six
in succession
in roulette…
There
a near-panicky
rush to
bet
on red, “No,
beginning
about the
black hadRolls
comeare
up aindependent!”
phenomenal fifteen

James:
it makes
no time
difference!
times. …players doubled and tripled their stakes, led to believe after black came
Q:
Prob{20 reds in a row followed by a black}?
up the twentieth time that there was not a chance in a million of another repeat.
In the
end thereds
unusual
runrow}?
enriched the Casino by some millions of francs.
Prob{21
in a
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
12
Probability Refresher: Change or Not?
1. Pick a door: 1 door has a prize! The other 2 have goats
Q: A door with a goat is opened: do you want to change your
chosen door, or stay with the one you have?
A: Switching gives you a 2/3 chance to win! Look up Monty Hall
problem – Conditional probabilities
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
13
Probability Refresher: You Be The Judge
 A terrible crime has occurred in city A, and John is one
suspect
 A DNA matching that of John is found in the crime scene

This is the only evidence against John
 Two DNAs matching have a 1 in a million chance
 The prosecutor and jury conclude John is guilty
Q: Were they right?
 City A has about 10 million people.
Q: What is the chance that John is innocent?
A: John is innocent with a 90% chance!!!
BAYES RULE: P[Innocent | Evidence] 
Thrasyvoulos Spyropoulos / [email protected]
P[Evidence | Innocence]P[Innocence]
P[Evidence]
Eurecom, Sophia-Antipolis
14
Probability Refresher: Network Bootstrapping
 Sensor node bootstrapping

Each node has a unique ID
 Goal: each node needs to broadcast
x.y.z
its ID to all other nodes
Protocol
 Node X picks a slot n uniformly
in [1,N]  P{n = i} = 1/N
 Broadcasts its ID in slot n


t
SUCCESS: no other node picked n
COLLISION: 2 or more nodes picked n  nodes fail and stay “off”
Tradeoff: Low N  many collisions || High N  long delay
Q: If 30 nodes, what is the minimum N  P{collision} < 10%?
A: 200? 500?
N 1000?
> 45005000?
(look up “birthday paradox”)
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
15
Why Modeling and Performance Analysis?
Amazon Cloud
Traffic
from
different
clients
Throughput?
Delay per Client?
Best Algorithm? E.g. Resource
Allocation (CPU/Network/Memory)
 Identify bottlenecks: Improve Eurecom WiFi:
a)
b)
Install more Access Points? (network is too sparse)
Propose a better channel selection algorithm? (high interference)
 Knowing perf. analysis can make you good consultants :-)
“All models are bad! But some are useful”
by statistician George Box
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
16
Upgrading the Company’s Web Server
Job requests /
per second
2x 
Jobs served / per second
?
 Your company has just got very positive publicity  The
incoming load (arrival rate of requests) to the web server
is expected to double as a result
 Your boss tells you that you need to upgrade the server
with a faster one (higher service rate μ) to ensure the
same mean response time E[T]
Q: How much should you increase the service rate?
a)
b)
c)
Double the server speed?
More than double the speed?
Less than double the speed?
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
17
Supermarket Queues and FDMA vs. CSMA
30 cust/hour
10 cust/hour
OPTION 1
 3 slow cashiers
 3 lines  randomly
choose line and stay there
 Each cashier: 10
customers per hour
Thrasyvoulos Spyropoulos / [email protected]
OPTION 2
 1 fast cashier
 single line
 30 customers per
hour
Eurecom, Sophia-Antipolis
18
Supermarket Queues and FDMA vs. CSMA
30 cust/hour
10 cust/hour
Option 1
Option 2
Q: Which options has the smallest waiting time?
A: Option 2 is 3x faster!
- Option 1 or Option 2?
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
19
19
Supermarket Queues and FDMA vs. CSMA (2)
30 cust/hour
10 cust/hour
Option 2
Option 3
Q: Which options has the smallest waiting time?
A: Similar delay (for high load)
- Option 2 or Option 3?
- Low load: Option 2 or Option 3? A: Option 2 up to 3x faster
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
20
Supermarket Queues and FDMA vs. CSMA (3)
 How does all these relate to networking??
600KHz
OPTION 1: FDMA
 Separate 200KHz channel to each
 Flows do not compete
600KHz
OPTION 2: CSMA
 Each node senses the channel first
 If idle  transmit pkt using 600KHz
 If busy  queue (wait)
Q: Which option would you prefer for data?
Q: Which option would you prefer for voice?
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
21
Google’s PageRank: Searching for Web Pages
 What lies behind this simple box??
 Searching the Web
Step 1: crawl all web pages and create index of {keywords-web pages}
 Step 2: User enters keywords (e.g. “Network Modeling”)
 Step 3: Google finds all web pages matching these keywords
(these 3 steps are generic to almost every search engine)
 Step 4: Return a list of matches ranked by importance. HOW???

 Intuition 1: page important if many web pages refer to it
 Intuition 2: page important if important web pages refer to it
Solution: PageRank algorithms solves an appropriate Markov Chain
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
22
Discrete Time Markov Chains
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
(Discrete-Time) Markov Process
 X(n): the state/value of a process at the n-th period (time slot)

X(n) is random and takes values in a finite or countable set
 The sequence X(1), X(2), …, X(n) could be
a)
b)
c)
The values of a stock each day
The web page a user is currently browsing
The Access Point(AP) a moving user is currently associated with
 We would like a probabilistic model for X(1),X(2),…,X(n)
a)
b)
X(i) are independent => not so realistic (for above examples)
X(i+1) depends on X(i) => not a bad model to start with
P{X(n+1) = j | X(n) = i,X(n-1) = in-1,…,X(1) = X0} = P{X(n+1) = j | X(n) = i} = pij
(Markov Property)
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
24
Transition Matrix and Properties
 Assume state space: 0, 1, 2, …
 Pij : P{S(n+1) = j | S(n) = i}

p
ij
 1, i
j
 Stationary: P{S(n+1) = j | S(n) = i} = P{S(1) = j | S(0) = i}

for any n
 pij define a transition matrix P =
Thrasyvoulos Spyropoulos / [email protected]
p00
p10
p20

Eurecom, Sophia-Antipolis
p01 p02
p11 p12
p 21 p22






25
Graphical Representation of Markov Chains
 Simple weather model

X(n): weather after n days
 Simple channel error model



q
p
(total) prob of error = 0.1
Case 1: q = 0.1, p=0.9 (uncorrelated)
Case 2: q = 0.9 (burst of errors)
1-p
good
bad
1-q
 What is the transition
matrix P in both cases?
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
26
Markov Chain Models (cont’d)
 A simple handover model



cell -> state
need to find transition probabilities
depend on road structure, user profile,
statistics
0.8
0.2
user on the phone
0.6
0.5
0.3
0.4
0.4
0.2
0.7
0.15
0.2
Cellular Network
Thrasyvoulos Spyropoulos / [email protected]
Markov Chain
Eurecom, Sophia-Antipolis
27
2-step transition probabilities
 So far we know probabilities for next step: S(n)->S(n+1)
 Probabilities for S(1) -> S(n)?
PS(2)  j | S(0)  i 
 PS(2)  j | S(1)  k,S(0)  i PS(1)  k | S(0)  i
k
 P{S(2)  j | S(0)  i}   pik pkj
k
 If P(2) = {pij(2)} denotes the 2-step trans. probability matrix,
then P(2) = P • P

Pij(2): multiply row I with column j
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
28
n-step transition probabilities
 Generalizing for n steps:
(l)
PS(n)  j | S(0)  i   p(m)
p
(m  l  n)
ik
kj ,
k
 Kolmogorov-Chapman equations
 P(n) = P•P…P = (P)n (n-step transition probabilities)
 P(n) = P(m) • P(l)

e.g. P(5) = P(4) • P(1) = P(2) • P(3) = P(2) • P • P(2)
 Initial state vector s(0): e.g. (0,1,0,0) or = (0.2,0,0.6,0.2)
Q: Where after n steps?
A: s(n) = s(0) • P(n)
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
29
An example: Pre-Fetching Web Pages
 Your iPhone browser can pre-fetch pages to improve speed
 The current Web Page being browsed has 3 links
 Study of click statistics have shown that




A user clicks link 1 next with prob 0.8
Link 2 with prob 0.1
Link 3 with prob 0.1
It seems reasonable that the browser pre-fetches link 1
 Assume now a tiny subset of the Web with the following
transition matrix
A B C D 

Start at page A
S(0) = [1, 0, 0, 0]
S(1) = [0.2, 0.2, 0.5, 0.1]
S(2) = [0.13, 0.29, 0.24, 0.34]
page A  0.2
page B   0.1


page C   0.1


page
D

 0.2
0.1
0.2 0.1 0.6 
0.4 0.2 0.3 

0.1 0.2 0.5 
0.2 0.5
Q: Which web page should the browser pre-fetch?
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
30
Limiting distribution
Q: What happens to pij(n) as n goes to infinity?
Q: What is the lim P(n) as n->∞?
Q: Does it always converge?? (we’ll see this later)
 If limit exists, then lim pijn  πj for any initial state i
n
 π = {π0, π1,…, πm} is called the stationary distribution
IF π• P = π  𝜋𝑗 =
𝜋𝑖 ∙ 𝑝𝑖𝑗
and Σiπi = 1
 The above equation can be used to find π
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
31
Getting the Stationary Distribution: Examples
 The weather system
0.8 0.2  (π
sunny , πrainy )  (3/4,1/4)
P

0.6 0.4
 Data rates supported for outdoor user

e.g. (128Kbps, 320Kbps, 1 Mbps)
0.4 0.4 0.2
2 3 2
P  0.2 0.6 0.2  (π128, π320, π1024)  ( , , )
7 7 7
0.3 0.2 0.5
 What about this one?
0.4 0.6 0
P  0.3 0.7 0
 0
0 1 
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
32
A Definition Prelude (Part I)
0.4 0.6 0
P  0.3 0.7 0
 0
0 1 
(Reachability) State j is reachable by i  Pij(n) > 0 for some n

denote ij
(Communication) states i and j communicate (i <-> j) if ij and ji
(Recurrence/Transience) Denote fi the probability to ever return to i,
starting from i

(n)
 Transient) state is transient if fi < 1  n0 pii   (finite num of returns)

number of returns is geometric (fi): why?
 Recurrent) state i is recurrent if fi = 1  n0 pii(n)   (infinite num of returns)

Positive recurrent) expected time between visits to i is finite
 Null recurrent) expected time between visits is infinite (strange!?)

 Periodicity) state i is periodic, if exists d such that pii(n) = 0, if n ≠ kd

Largest d with above property is called period
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
33
A Definition Prelude (Part II)
0.4 0.6 0
P  0.3 0.7 0
 0
0 1 
Definition: A subset of states S: {for all i,j in S => i <-> j} is called a class
Lemma 1: recurrence is a class property
if i is recurrent and i <-> j, then j is recurrent
 Proof (by contradiction): if j transient => can never return to j after some time =>
cannot be at i either (since there is always a chance to go from i to j)

Lemma 2: transience is a class property
Similar argument
 Periodicity and positive/null-recurrence are also class properties

Irreducible) A Markov Chain is irreducible if it has only 1 class

A finite MC which is irreducible is always positive-recurrent
Theorem: If a DTMC is aperiodic, irreducible and positive recurrent
(“ergodic”) then it has a stationary distribution π = {π1,…,πN}
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
34
A Motivating Example: Google PageRank Algorithm
 Search for “Network Modeling”

1000s of pages (courses, companies, etc.) contain these keywords
 Goal: Make the page I’m interested appear in at least the top 10 pages
shown
Q: How should pages be ranked?
A1: A page is important if many links to this page
Q: What is wrong with this metric?
A: - link from Yahoo more important than link from /~spyropou
- can easily “fool” this! (create many dummy pages pointing to mine)
Q: How to fix this?
A: weigh a link from x to page p with the number of links into x
Q: Can this system be fooled?
A: Yes! Create 1000 dummy pages pointing to each other and mine
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
35
Google’s Solution
 A page p has high rank (is important) if the pages pointing to it also
have high rank
Q: How is this different from before?
A: All the dummy pages pointing to each other would still be low rank, if no
“external” link from a high rank page
Q: Is a link from a page with 1000 other links as important as a link from a
page with 5 other links?
A: No! If page i has a link to j and k total outgoing links  Pij = 1/k
Q: Where does this lead? How do we calculate the rank?
A: rank of page j = Σi { rank of page i * Prob of link ij}
i.e. rj = Σi ri Pij
 (basis of) Page Rank Algorithm:
rank of pages = stationary prob. of an MC with transitions Pij
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
36
Page Rank Algorithm Pitfalls
Q: What about this tiny WWW?
A: πA = πN = 2/5, πM = 1/5
Q: But what about these two?
A: These two chains are reducible
and do not have a stationary distr.
 Google’s heuristic: introduce additional arrows to avoid such
loops and dead ends (not necessarily optimal!)
Q: How does Google solve the huge system of eq (million pages)?
Thrasyvoulos Spyropoulos / [email protected]
Eurecom, Sophia-Antipolis
37