Slides for Penn Reading Project - the Department of Computer and

Download Report

Transcript Slides for Penn Reading Project - the Department of Computer and

“The Tipping Point”
and the
Networked Nature of Society
Michael Kearns
Computer and Information Science
Penn Reading Project 2004
Gladwell, page 7:
“The Tipping Point is the biography of the idea…
that the best way to understand the emergence of
fashion trends, the ebb and flow of crime waves, or
the rise in teen smoking… is to think of them as
epidemics. Ideas and products and messages and
behaviors spread just like viruses do…”
…on networks.
The Networked Nature of Society
International Trade
[Krempel&Pleumper]
Corporate Partnerships
[Krebs]
Gnutella
Internet Routers
Artist Mark Lombardi
An Emerging Science
An Emerging Science
• Examining apparent similarities between many
human and technological systems & organizations
• Importance of network effects in such systems
• How things are connected matters greatly
• Structure, asymmetry and heterogeneity
• Details of interaction matter greatly
• The metaphor of viral spread
• Qualitative and quantitative; can be very subtle
• A revolution of
– measurement
– theory
– breadth of vision
Who’s Doing All This?
• Computer Scientists
– Understand and design complex, distributed networks
– View “competitive” decentralized systems as economies
• Social Scientists, Psychologists, Economists
– Understand human behavior in “simple” settings
– Revised views of economic rationality in humans
– Theories and measurement of social networks
• Physicists and Mathematicians
– Interest and methods in complex systems
– Theories of macroscopic behavior (phase transitions)
• All parties are interacting and collaborating
“Real World” Social Networks
• Example: Acquaintanceship networks
–
–
–
–
vertices: people in the world
links: have met in person and know last names
hard to measure
let’s do our own Gladwell estimate
• Example: scientific collaboration
–
–
–
–
–
–
vertices: math and computer science researchers
links: between coauthors on a published paper
Erdos numbers : distance to Paul Erdos
Erdos was definitely a hub or connector; had 507 coauthors
MK’s Erdos number is 3, via Mansour  Alon  Erdos
how do we navigate in such networks?
Update: MK’s Friendster NW, 1/19/03
• If you didn’t get my email invite, let me know
– send mail to [email protected]
•
•
•
•
•
•
•
Number of friends (direct links): 8
NW size (<= 4 hops): 29,901
13^4 ~ 29,000
But let’s look at the degree distribution
So a random connectivity pattern is not a good fit
What is???
Another interesting online social NW: [thanks Albert Ip!]
– AOL IM Buddyzoo
Biological Networks
• Example: the human brain
–
–
–
–
–
–
–
vertices: neuronal cells
links: axons connecting cells
links carry action potentials
computation: threshold behavior
N ~ 100 billion
typical degree ~ sqrt(N)
we’ll return to this in a moment…
Universality and Generative Models
A “Canonical” Natural Network has…
• Few connected components:
– often only 1 or a small number independent of network size
• Small diameter:
– often a constant independent of network size (like 6)
– or perhaps growing only logarithmically with network size
– typically exclude infinite distances
• A high degree of clustering:
– considerably more so than for a random network
– in tension with small diameter
• A heavy-tailed degree distribution:
– a small but reliable number of high-degree vertices
– quantifies Gladwell’s connectors
– often of power law form
Some Models of Network Generation
• Random graphs (Erdos-Renyi models):
– gives few components and small diameter
– does not give high clustering and heavy-tailed degree distributions
– is the mathematically most well-studied and understood model
• Watts-Strogatz and related models:
– give few components, small diameter and high clustering
– does not give heavy-tailed degree distributions
• Preferential attachment:
– gives few components, small diameter and heavy-tailed distribution
– does not give high clustering
• Hierarchical networks:
– few components, small diameter, high clustering, heavy-tailed
• Affiliation networks:
– models group-actor formation
• Nothing “magic” about any of the measures or models
So Which Properties Tip?
• Just about all of them!
• The following properties all have threshold functions:
–
–
–
–
having a “giant component”
being connected
having a perfect matching (N even)
having “small” diameter
• Demo: look at the following progression
– giant component  connectivity  small diameter
– in graph process model (add one new edge at a time)
– [example 1] [example 2] [example 3] [example 4] [example 5]
• With remarkable consistency (N = 50):
– giant component ~ 40 edges, connected ~ 100, small diameter ~ 180
“Epidemos”
[Thanks to Sangkyum Kim]
• Forest fire simulation:
– grid of forest and vacant cells
– fire always spreads to adjacent four cells
• “perfect” stickiness or infectiousness
– connectivity parameter:
• probability of forest
– fire will spread to connected component of source
– tip when forest ~ 0.6
– clean mathematical formalization (e.g. fraction burned)
• Viral spread simulation:
– population on a grid network, each with four neighbors
– stickiness parameter:
• probability of passing disease
– connectivity parameter:
• probability of adding random (long-distance) connections
– no long distance connections: tip at stickiness ~ 0.3
– at rewiring = 0.5, often tip at stickiness ~ 0.2
Incorporating Strategic and
Economic Behavior
Examples from Schelling and Beyond
• Going to the beach or not
– too few  you’ll go, making it more crowded
– too many  you won’t go, or will leave if you’re there
• Sending Christmas cards
– people send to those they expect will send to them
– everybody hates it, but no individual can break the cycle
• Investing in an apartment fire sprinkler
– only worth it if enough people do it
– insurance companies won’t discount for it
• Choosing where to sit in the Levine Auditorium
Local Preferences and Segregation
•
•
•
•
•
•
Special case of preferences: housing choices
Imagine individuals who are either “red” or “blue”
They live on in a grid world with 8 neighboring cells
Neighboring cells either have another individual or are empty
Individuals have preferences about demographics of their neighborhood
Here is a very nice simulator
A Sample Network and Equilibrium
• Solid edges:
– exchange at equilibrium
• Dashed edges:
– competitive but unused
• Dotted edges:
– non-competitive prices
• Note price variation
– 0.33 to 2.00
• Degree alone does not
determine price!
– e.g. B2 vs. B11
– e.g. S5 vs. S14
The Internet as Society
The Internet: What is It?
• The Internet is a massive network of connected but
decentralized computers
• Began as an experimental research NW of the DoD
(ARPAnet) in the 1970s
• All aspects (protocols, services, hardware, software)
evolved over many years
• Many individuals and organizations contributed
• Designed to be open, flexible, and general from the start
• Completely unlike prior centralized, managed NWs
– e.g. the AT&T telephone switching network
Hubs and Authorities
• Suppose we have a large collection of pages on some topic
•
•
•
•
– possibly the results of a standard web search
Some of these pages are highly relevant, others not at all
How could we automatically identify the important ones?
What’s a good definition of importance?
Kleinberg’s idea: there are two kinds of important pages:
– authorities: highly relevant pages
– hubs: pages that point to lots of relevant pages
– (I had these backwards last time…)
• If you buy this definition, it further stands to reason that:
– a good hub should point to lots of good authorities
– a good authority should be pointed to by many good hubs
– this logic is, of course, circular
• We need some math and an algorithm to sort it out
• Networked Life (CSE 112) web site:
– www.cis.upenn.edu/~mkearns/teaching/NetworkedLife
– these slides:
• www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/prp.ppt
• Feel free to contact me at
– [email protected]