lect04 - Duke Computer Science

Download Report

Transcript lect04 - Duke Computer Science

The Internet





How valuable is a network?
 Metcalfe’s Law
Domain Name System: translates betweens names and IP
addresses
Properties of the Internet
 Heterogeneity
 Redundancy
 Packet-switched
 1.08 billion online (Computer Industry Almanac 2005)
Who has access?
How important is access?
CompSci 001
4.1
Tim Berners-Lee
I want you to realize that, if you
can imagine a computer doing
something, you can program a
computer to do that.
Unbounded opportunity...
limited only by your
imagination. And a couple of
laws of physics.
 TCP/IP, HTTP
 How, Why, What, When?
CompSci 001
4.2
Graphs: Structures and Algorithms

How do packets of bits/information get routed on the internet
 Message divided into packets on client (your) machine
 Packets sent out using routing tables toward destination
• Packets may take different routes to destination
• What happens if packets lost or arrive out-of-order?

Routing tables store local information, not global (why?)

What about The Oracle of Bacon, Erdos Numbers, and Word
Ladders?
 All can be modeled using graphs
 What kind of connectivity does each concept model?

Graphs are everywhere in the world of algorithms (world?)
CompSci 001
4.3
Vocabulary

Graphs are collections of vertices
and edges (vertex also called
node)
 Edge connects two vertices
• Direction can be important,
directed edge, directed graph
• Edge may have associated
weight/cost

A vertex sequence v0, v1, …, vn-1 is
a path where vk and vk+1 are
connected by an edge.
 If some vertex is repeated, the
path is a cycle
 A graph is connected if there is
a path between any pair of
vertices
CompSci 001
78
NYC
Phil
268
204
190
Wash DC
LGA
$412
Boston
394
$441
$186
LAX
$1701
DCA
$186
ORD
4.4
Network/Graph questions/algorithms

What vertices are reachable from a given vertex?
 Two standard traversals: depth-first, breadth-first
 Find connected components, groups of connected vertices

Shortest path between any two vertices (weighted graphs?)!

Longest path in a graph
 No known efficient algorithm
 Longest shortest path: Diameter of graph
Visit all vertices without repeating? Visit all edges?
 With minimal cost? Hard!
What are the properties of the network?
 Structural: Is it connected?
 Statistical: What is the average number of neighbors?


CompSci 001
4.5
Network Nature of Society

Slides from Michael Kearns - Univ. of Pennsylvania
CompSci 001
4.6
Emerging science of networks









Examining apparent similarities between many human and technological
systems & organizations
Importance of network effects in such systems
How things are connected matters greatly
Structure, asymmetry and heterogeneity
Details of interaction matter greatly
The metaphor of viral spread
Dynamics of economic and strategic interaction
Qualitative and quantitative; can be very subtle
A revolution of
 measurement
 theory
 breadth of vision
CompSci 001
(M. Kearns)
4.7
“Real World” Social Networks


Example: Acquaintanceship networks
 vertices: people in the world
 links: have met in person and know last names
 hard to measure
Example: scientific collaboration
 vertices: math and computer science researchers
 links: between coauthors on a published paper
 Erdos numbers : distance to Paul Erdos
 Erdos was definitely a hub or connector; had 507 coauthors
 how do we navigate in such networks?
CompSci 001
(M. Kearns)
4.8
Online Social Networks



A somewhat recent example: Friendster
 vertices: subscribers to www.friendster.com
 links: created via deliberate invitation
More recent and interesting: thefacebook
 Join the Computer Science 1 group!
Older example: social interaction in LambdaMOO
 LambdaMOO: chat environment with “emotes” or verbs
 vertices: LambdaMOO users
 links: defined by chat and verb exchange
 could also examine “friend” and “foe” sub-networks
CompSci 001
(M. Kearns)
4.9
Content Networks



Example: document similarity
 vertices: documents on the web
 links: defined by document similarity (e.g. Google)
 here’s a very nice visualization
 not the web graph, but an overlay content network
Of course, every good scandal needs a network
 vertices: CEOs, spies, stock brokers, other shifty characters
 links: co-occurrence in the same article
Then there are conceptual networks
 a thesaurus defines a network
 so do the interactions in a mailing list
CompSci 001
(M. Kearns)
4.10
Business and Economic Networks




Example: eBay bidding
 vertices: eBay users
 links: represent bidder-seller or buyer-seller
 fraud detection: bidding rings
Example: corporate boards
 vertices: corporations
 links: between companies that share a board member
Example: corporate partnerships
 vertices: corporations
 links: represent formal joint ventures
Example: goods exchange networks
 vertices: buyers and sellers of commodities
 links: represent “permissible” transactions
CompSci 001
(M. Kearns)
4.11
Physical Networks



Example: the Internet
 vertices: Internet routers
 links: physical connections
 vertices: Autonomous Systems (e.g. ISPs)
 links: represent peering agreements
 latter example is both physical and business network
Compare to more traditional data networks
Example: the U.S. power grid
 vertices: control stations on the power grid
 links: high-voltage transmission lines
 August 2003 blackout: classic example of interdependence
CompSci 001
(M. Kearns)
4.12
US Power Grid
CompSci 001
4.13
Business & Economic Networks




Example: eBay bidding
 vertices: eBay users
 links: represent bidder-seller or buyer-seller
 fraud detection: bidding rings
Example: corporate boards
 vertices: corporations
 links: between companies that share a board member
Example: corporate partnerships
 vertices: corporations
 links: represent formal joint ventures
Example: goods exchange networks
 vertices: buyers and sellers of commodities
 links: represent “permissible” transactions
CompSci 001
4.14
Content Networks


Example: Document similarity
 Vertices: documents on web
 Edges: Weights defined by similarity
 See TouchGraph GoogleBrowser
Conceptual network: thesaurus
 Vertices: words
 Edges: synonym relationships
CompSci 001
4.15
Enron
CompSci 001
4.16
Social networks

Example: Acquaintanceship networks
 vertices: people in the world
 links: have met in person and know last names
 hard to measure

Example: scientific collaboration
 vertices: math and computer science researchers
 links: between coauthors on a published paper
 Erdos numbers : distance to Paul Erdos
 Erdos was definitely a hub or connector; had 507 coauthors
How do we navigate in such networks?

CompSci 001
4.17
CompSci 001
4.18
Acquaintanceship & more
CompSci 001
4.19
Network Models (Barabasi)

Differences between Internet, Kazaa, Chord
 Building, modeling, predicting

Static networks, Dynamic networks
 Modeling and simulation

Random and Scale-free
 Implications?

Structure and Evolution
 Modeling via Touchgraph
CompSci 001
4.20
Web-based social networks
http://trust.mindswap.org






Myspace
73,000,000
Passion.com 23,000,000
Friendster
21,000,000
Black Planet 17,000,000
Facebook
8,000,000
Who’s using these, what are they doing, how often are they
doing it, why are they doing it?
CompSci 001
4.21
Golbeck’s Criteria

Accessible over the web via a browser

Users explicitly state relationships
 Not mined or inferred

Relationships visible and browsable by others
 Reasons?

Support for users to make connections
 Simple HTML pages don’t suffice
CompSci 001
4.22
CSE 112, Networked Life (UPenn)

Find the person in Facebook with the most friends
 Document your process

Find the person with the fewest friends
 What does this mean?

Search for profiles with some phrase that yields 30-100
matches
 Graph degrees/friends, what is distribution?
CompSci 001
4.23
CompSci 1: Overview CS0

Audioscrobbler and last.fm
 Collaborative filtering
 What is a neighbor?
 What is the network?
CompSci 001
4.24
What can we do with real data?

How do we find a graph’s diameter?
 This is the maximal shortest path between any pair of
vertices
 Can we do this in big graphs?

What is the center of a graph?
 From rumor mills to terrorists
 How is this related to diameter?

Demo GUESS (as augmented at Duke)
 IM data, Audioscrobbler data
CompSci 001
4.25
My recommendations at Amazon
CompSci 001
4.26
And again…
CompSci 001
4.27
Collaborative Filtering

Goal: predict the utility of an item to a particular user based on
a database of user profiles
 User profiles contain user preference information
 Preference may be explicit or implicit
• Explicit means that a user votes explicitly on some scale
• Implicit means that the system interprets user behavior or
selections to impute a vote

Problems
 Missing data: voting is neither complete nor uniform
 Preferences may change over time
 Interface issues
CompSci 001
4.28