Slides - the Department of Computer and Information Science

Download Report

Transcript Slides - the Department of Computer and Information Science

The Networked Nature
of Society
Networked Life
CSE 112
Spring 2004
Prof. Michael Kearns
Course News and Notes 1/20
• Course mailing list: [email protected]
• Updates to course web site
• Recitation sections:
– Tuesday (today) 6-7 PM; room announced later today
– Wednesday 5-6 in Towne 313
Course News and Notes 1/15
• Please give last-names exercise to Nick now
• If you missed the first lecture, please get the handouts
• Extra credit for identifying papers, web sites, demos or
other material related to class, if it gets used
• Prof. Kearns office hours today 10:30-11:30
• Tuesday recitation may be moved to 6 PM; stand by
• Please start reading “The Tipping Point”
– required chapters: 1,2,4,5
What is a Network?
•
•
•
•
•
•
•
A collection of individual or atomic entities
Referred to as nodes or vertices
Collection of links or edges between vertices
Links represent pairwise relationships
Links can be directed or undirected
Network: entire collection of nodes and links
Extremely general, but not everything:
– actors appearing in the same film
– lose information by pairwise representation
• We will be interested in properties of networks
– often statistical properties of families of networks
Some Definitions
• Network size: total number of vertices (denoted N)
• Maximum number of edges: N(N-1)/2 ~ N^2/2
• Distance between vertices u and v:
– number of edges on the shortest path from u to v
– can consider directed or undirected cases
– infinite if there is no path from u to v
• Diameter of a network:
– worst-case diameter: largest distance between a pair
– average-case diameter: average distance
• If the distance between all pairs is finite, we say the
network is connected; else it has multiple components
• Degree of vertex u: number of edges
Examples of Networks
“Real World” Social Networks
• Example: Acquaintanceship networks
–
–
–
–
vertices: people in the world
links: have met in person and know last names
hard to measure
let’s do our own Gladwell estimate
• Example: scientific collaboration
–
–
–
–
–
–
vertices: math and computer science researchers
links: between coauthors on a published paper
Erdos numbers : distance to Paul Erdos
Erdos was definitely a hub or connector; had 507 coauthors
MK’s Erdos number is 3, via Mansour  Alon  Erdos
how do we navigate in such networks?
Online Social Networks
• A very recent example: Friendster
– vertices: subscribers to www.friendster.com
– links: created via deliberate invitation
– Here’s an interesting visualization by one user
• Older example: social interaction in LambdaMOO
–
–
–
–
LambdaMOO: chat environment with “emotes” or verbs
vertices: LambdaMOO users
links: defined by chat and verb exchange
could also examine “friend” and “foe” sub-networks
Update: MK’s Friendster NW, 1/19/03
• If you didn’t get my email invite, let me know
– send mail to [email protected]
•
•
•
•
•
•
•
Number of friends (direct links): 8
NW size (<= 4 hops): 29,901
13^4 ~ 29,000
But let’s look at the degree distribution
So a random connectivity pattern is not a good fit
What is???
Another interesting online social NW: [thanks Albert Ip!]
– AOL IM Buddyzoo
Content Networks
• Example: document similarity
–
–
–
–
vertices: documents on the web
links: defined by document similarity (e.g. Google)
here’s a very nice visualization
not the web graph, but an overlay content network
• Of course, every good scandal needs a network
– vertices: CEOs, spies, stock brokers, other shifty characters
– links: co-occurrence in the same article
• Then there are conceptual networks
– vertices: concepts to be discussed in NW Life
– links: arbitrarily determined by Prof. Kearns
• Update: here are two more examples [thanks Hanna Wallach!]
– a thesaurus defines a network
– so do the interactions in a mailing list
Business and Economic Networks
• Example: eBay bidding
– vertices: eBay users
– links: represent bidder-seller or buyer-seller
– fraud detection: bidding rings
• Example: corporate boards
– vertices: corporations
– links: between companies that share a board member
• Example: corporate partnerships
– vertices: corporations
– links: represent formal joint ventures
• Example: goods exchange networks
– vertices: buyers and sellers of commodities
– links: represent “permissible” transactions
Physical Networks
• Example: the Internet
–
–
–
–
–
vertices: Internet routers
links: physical connections
vertices: Autonomous Systems (e.g. ISPs)
links: represent peering agreements
latter example is both physical and business network
• Compare to more traditional data networks
• Example: the U.S. power grid
– vertices: control stations on the power grid
– links: high-voltage transmission lines
– August 2003 blackout: classic example of interdependence
Biological Networks
• Example: the human brain
–
–
–
–
–
–
–
vertices: neuronal cells
links: axons connecting cells
links carry action potentials
computation: threshold behavior
N ~ 100 billion
typical degree ~ sqrt(N)
we’ll return to this in a moment…
Network Statics
• Emphasize purely structural properties
– size, diameter, connectivity, degree distribution, etc.
– may examine statistics across many networks
– will also use the term topology to refer to structure
• Structure can reveal:
–
–
–
–
community
“important” vertices, centrality, etc.
robustness and vulnerabilities
can also impose constraints on dynamics
• Less emphasis on what actually occurs on network
– web pages are linked, but people surf the web
– buyers and sellers exchange goods and cash
– friends are connected, but have specific interactions
Network Dynamics
• Emphasis on what happens on networks
• Examples:
– mapping spread of disease in a social network
– mapping spread of a fad
– computation in the brain
• Statics and dynamics often closely linked
– rate of disease spread (dynamic) depends critically on network
connectivity (static)
– distribution of wealth depends on network topology
• Gladwell emphasizes dynamics
Network Formation
• Why does a particular structure emerge?
• Plausible processes for network formation?
• Generally interested in processes that are
–
–
–
–
–
decentralized
distributed
limited to local communication and interaction
“organic” and growing
consistent with measurement
• The Internet versus traditional telephony
Course News and Notes 1/22
• Prof. Kearns’ Friendster NW: please send mail to
[email protected] if you want to join
• Prof. Kearns’ office hours today 10:30-12
• Two new articles added to required readings
• Homework 1 distributed today, due in class Feb 5
• NO CLASS on Feb 3
• Today’s agenda:
–
–
–
–
recap of “brain analysis” from NW Nature of Society lectures
further examples
Contagion, Tipping and NW material cont’d
distribution and discussion of HW 1
Brief Case Study: Associative Memory,
Grandmother Cells, and Random Networks
• A little more on the human brain:
– (neo)cortex most recently evolved
– memory and higher brain function
– long distance connections:
• pyramidal cells, majority of cortex
• white matter as a box of cables
– closet to a crude “random network”
• Associative memory:
– consider the Pelican Brief
– or Networked Life
• Grandmother cells:
– localist vs. “holistic” representation
– single vs. multi-cell recognition
A Back-of-the Envelope Analysis
• Let’s try assuming:
–
–
–
–
–
all connections equally likely
independent with probability p
grandmother cell representation
decentralized learning
correlated learning of conjunctions
• So:
– at some point have learned “pelican” and
“brief” in separate cells
– need to have cells connected to both
– but not too many such cells!
• In this model, p ~ 1/sqrt(N) balances
near-certain upper and lower bounds on
common neighbors for random pairs
• Broadly consistent with biology
Remarks
• Network formation:
– random long-distance
– is this how the brain grows?
• Network structure:
– common neighbors for arbitrary cell pairs
– implications for degree distribution
• Network dynamics:
– distributed, correlation-based learning
• There is much that is broken with this story
• But it shows how a set of plausible assumptions can lead
to nontrivial constraints
Recap
• We chose a particular, statistical model of network generation
– each edge appears independently and with probability p
– why? broadly consistent with long-distance cortex connectivity
– a statistical model allows us to study variation within certain constraints
• We were interested in the NW having a certain global property
– any pair of vertices should have a small number of common neighbors
– why? corresponds to controlled growth of learned conjunctions, in a model
assuming distributed, correlated learning and “grandmother cells”
• We asked whether our NW model and this property were consistent
– yes, assuming that p ~ 1/sqrt(N)
– this implies each neuron (vertex) will have about p*N ~ sqrt(N) neighbors
– and this is roughly what one finds biologically
• Note: this statement is not easy to prove
Another Example
• We choose a particular, statistical model of network generation
– each edge appears independently and with probability p
– why? might propose it as a first guess as to how social networks form
• We are interested in the NW having a certain global property
– that the network be connected, yet not have large (say, log(N)) cliques
– why? want a connected society without overly powerful subgroups
• We ask whether our NW model and this property are consistent
– I don’t know the answer to this one yet
• We’ll follow this kind of questioning and analysis frequently
A Non-Statistical Example
• Often interested in the mere existence of certain NWs
–
–
–
–
–
let D be the largest degree allowed
why? e.g. because there is a limit to how many friends you can have
let D be a bound on the diameter of the network
why? because many have claimed that D is often small
let N(D,D) = size of the largest possible NW obeying D and D
• Exact form of N(D,D) is notoriously elusive
– but known that it is between (D/2)^D and 2D^D
• So, for example, if we want N ~ 300M (U.S. population):
– if D = 150 (e.g. see Gladwell) and D = 6 (6 degrees): NW exists
– D = 6, N = 300M, solve 2D^D > N: get D > 23