PPT - Institute for Web Science and Technologies
Download
Report
Transcript PPT - Institute for Web Science and Technologies
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Micro-interactions and Macro-observations
Klaas Dellschaft
Example: Naming Game (I)
Micro-interactions …
Mother talking to her child
http://www.youtube.com/watch?v=kiGduwJK6SQ
Macro-observations …
Child learns to speak
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
2 of 30
Example: Naming Game (II)
Kuh
???
User2:
Kuh
Cow
Kuh
???
Cow
User1:
Cow
Kuh
Kuh
???
User 3:
Kuh
Cow
User roles: Speaker/ Hearer
Speaker: Speaks a word
Hearer: Tries to guess which object was meant
Successful round: Hearer makes a correct guess
Objective: Maximize the number of successful rounds
http://talking-heads.csl.sony.fr
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
3 of 30
Example: Naming Game (III)
Micro-level interactions …
Speaker / hearer
Round successful?
• Yes: Reinforce the used word
• No: Learn new word
Macro-level observations …
Stable vocabulary emerges over time
For each object / attribute, only one word survives
Naming game explains how languages may emerge
Why are there many different languages on the world?
Naming game ignores geographic distribution of agents
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
4 of 30
Model-based research
Modeling micro-interactions
Define rules for interactions between agents
Use rules for simulating the dynamics in a system
Objective: Explain the emergence of macro-observations
Use cases:
Biology: Spreading of diseases in a population
Sociology: Emergence of different cultural habits
Web Science:
• Spreading of memes / hashtags in Twitter
• Emergence of a collaborative vocabulary in tagging systems
• …
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
5 of 30
Basic Models (I)
Preferential Attachment (Polya Urn Model)
There are n balls with different colors in an urn
In each step:
• Randomly draw a ball
• Put it back together with a second ball of the same color
Fixed number of colors
Colors are distributed according to a power law
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
6 of 30
Basic Models (II)
Linear Preferential Attachment (Simon Model)
Like the Polya Urn Model. Additionally in each step:
• Instead of drawing a ball, insert with low probability p a ball
with a new color
Linear increasing number of colors
Colors are distributed according to a power law
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
7 of 30
Basic Models (III)
Information Cascades
Users decide rationally between alternatives
• Example: Accept (A) / Reject (R)
Each user gets private information
• When the correct decision is to accept, the user more likely
gets the information to accept (i.e. P(A) > 0.5)
Each user sees the decision of the previous users
Rational choice:
• Adopt the choice of the majority of previous users and
private information
Choice only relies on decision of previous users, if the
difference in votes between A and R increases beyond 2
All subsequent users adopt the same choice cascade
Not necessarily the correct decision is cascaded!!!
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
8 of 30
Method of Model-based Research
Reality
Micro-interactions
Unknown Model
Macro-observations
Observed Properties
Compare
Model
Unknown rules of interaction
Stochastic Model
Simulated Properties
Assumed rules of interaction
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
9 of 30
Use Case: Spreading of Memes in Twitter (I)
Meme: Topic / idea that is discussed in Twitter
Observables:
Lifetime of tweets in Twitter (in hours)
Number of people contributing to a meme (per day)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
10 of 30
Use Case: Spreading of Memes (II)
Assumed rules of interaction:
Each user can see memes posted by his friends
Each user remembers his own previously tweeted memes
When tweeting, a user either …
• … invents a new meme, or …
• … randomly selects a meme posted by his friends, or …
• … randomly picks up one of his previously tweeted memes
Users only remember the last n tweets of their friends
and/or of their own
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
11 of 30
Use Case: Spreading of Memes (III)
Comparing simulation and reality:
Empirical observations are better reproduced when
assuming a social network between users
Structure of the friendship network influences meme spreading
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
12 of 30
Details of Model-based Research
How to represent observables?
Distribution functions
How to compare simulation and reality?
Analytical evaluation
Visual comparison
Goodness-of-fit tests
How to decide between competing models?
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
13 of 30
Method of Model-based Research
Reality
Micro-interactions
Unknown Model
Macro-observations
Observed Properties
Compare
Model
Unknown rules of interaction
Stochastic Model
Simulated Properties
Assumed rules of interaction
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
14 of 30
Use Case: Dynamics in Tagging Systems
Do the users agree on how to describe a resource?
How do users influence each other in tagging systems?
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
15 of 30
Folksonomies
Vertexes: Users, tags, resources
Hyperedges: Tag assignments (user X tag X resource)
Postings:
Tag assignments of a user to a single resource
Can be ordered according to their time-stamp
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
16 of 30
Co-occurrence Streams
Co-occurrence Streams:
All tags co-occurring with a given tag in a posting
Ordered by posting time
Example tag assignments for ‘ajax':
{mackz, r1, {ajax, javascript}, 13:25}
{klaasd, r2, {ajax, rss, web2.0}, 13:26}
{mackz, r2, {ajax, php, javascript}, 13:27}
Resulting co-occurrence stream:
javascript rss web2.0 php javascript
time
WeST
Tag
|Y|
|U|
|T|
|R|
ajax
blog
xml
2.949.614
6.098.471
974.866
88.526
158.578
44.326
41.898
186.043
31.998
71.525
557.017
61.843
Klaas Dellschaft
[email protected]
Introduction to Web Science
17 of 30
Co-occurrence Streams – Tag Frequencies
Zipf Plot of
the tag
frequencies
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
18 of 30
Probability Distributions
Measuring the probability of a certain event
Examples:
Rolling a dice – How often do we get the 1, 2, 3, …?
Questionnaires – How often do people check the 1, 2, …
on a scale from 1 to 10?
Tagging – How often is the tag ‘ajax’ used?
Tagging – How many of the used tags are used 1-time,
2-times, …?
Different types of measurement scales
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
19 of 30
Probability Distributions – Measurement Scales (I)
Nominal scale
Ordinal scale
Interval scale
Source: http://de.wikipedia.org/wiki/Skalenniveau
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
20 of 30
Ratio scale
Probability Distributions – Measurement Scales (II)
Nominal Scale
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
blog
health
food
nutrition
eating
cooking
Probability of Tags with Frequency x
Ordinal Scale /
Interval Scale
0,6
0,5
0,4
0,3
0,2
0,1
0
1
2
3
4
5
6
Tag Frequency
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
21 of 30
7
8
Probability Distributions – Representations (I)
Probability Distribution Function (PDF):
P(X = x): Probability of observing an event x
Cumulative Distribution Function (CDF):
P(X x): Probability of observing an event whose
value is x. Requires at least ordinal measurement scale.
Example: Normal distribution
PDF
CDF
Source: http://en.wikipedia.org/wiki/Normal_distribution
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
22 of 30
Probability Distributions – Representations (II)
Zipf plot
Representation for distributions with nominal scale
Assign ranks to the different categories
• Rank 1: Most often occurring category
x-axis: Categories ordered by their ranks
y-axis: Probability of category with rank x
Often used for representing word frequencies in texts
Zipfs law:
Describes the relation between the rank k and the
frequency f(k) of a word in natural language texts
f (k; s) k s , s 0
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
23 of 30
Co-occurrence Streams – Tag Frequencies
Tag frequencies approx.
follow Zipf’s law (straight
line in Zipf plot with logarithmically scaled axes)
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
24 of 30
Method of Model-based Research
Reality
Micro-interactions
Unknown Model
Macro-observations
Observed Properties
Compare
Model
Unknown rules of interaction
Stochastic Model
Simulated Properties
Assumed rules of interaction
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
25 of 30
Comparing Reality and Model (I)
Visual comparison:
Visually plot the real observables and the simulated results
The closer together the plots, the better the model
Advantage: Easy to understand and to implement
Disadvantage: Highly subjective (i.e. not a scientific
method)
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
26 of 30
Comparing Model and Reality (II)
Analytical evaluation:
Use mathematical methods for analyzing the model
Proof that the simulation results have certain properties
Example: Preferential attachment
• Frequency distribution of colors is a power-law
• Color frequencies tend to a random limit
Advantages:
Very deep understanding of the mechanisms
Mathematical dependencies between model parameters and
properties of the simulation results
Disadvantages:
Analyzed models have to be “mathematically tractable”
Does not show that simulated properties can also be observed in
reality
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
27 of 30
Comparing Model and Reality (III)
Goodness-of-fit tests:
First step:
• Define objective measure of distance between simulated and
observed property
Relative measure of goodness-of-fit
Applicable for any property
Second step:
• Computer whether simulated and observed property are
statistically indistinguishable
Absolute measure of goodness-of-fit
Only applicable for properties that can be represented as
probability distributions
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
28 of 30
Kolmogorov-Smirnov Test (Example)
Goodness-of-fit test for distributions with at least ordinal measurement scale
Maximal distance between simulation and observation: D max | S1 ( x) S 2 ( x) |
x
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
29 of 30
Details of Model-based Research
How to represent observables?
Distribution functions
How to compare simulation and reality?
Analytical evaluation
Visual comparison
Goodness-of-fit tests
How to decide between competing models?
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
30 of 30
Friday!