PPT - Institute for Web Science and Technologies

Download Report

Transcript PPT - Institute for Web Science and Technologies

Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Micro-interactions and Macro-observations
Klaas Dellschaft
Example: Naming Game (I)
 Micro-interactions …
 Mother talking to her child
http://www.youtube.com/watch?v=kiGduwJK6SQ
 Macro-observations …
 Child learns to speak
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
2 of 30
Example: Naming Game (II)
Kuh
???
User2:
Kuh
Cow
Kuh
???
Cow
User1:
Cow
Kuh
Kuh
???
User 3:
Kuh
Cow
 User roles: Speaker/ Hearer
 Speaker: Speaks a word
 Hearer: Tries to guess which object was meant
 Successful round: Hearer makes a correct guess
 Objective: Maximize the number of successful rounds
 http://talking-heads.csl.sony.fr
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
3 of 30
Example: Naming Game (III)
 Micro-level interactions …
 Speaker / hearer
 Round successful?
• Yes: Reinforce the used word
• No: Learn new word
 Macro-level observations …
 Stable vocabulary emerges over time
 For each object / attribute, only one word survives
 Naming game explains how languages may emerge
 Why are there many different languages on the world?
 Naming game ignores geographic distribution of agents
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
4 of 30
Model-based research
 Modeling micro-interactions
 Define rules for interactions between agents
 Use rules for simulating the dynamics in a system
 Objective: Explain the emergence of macro-observations
 Use cases:
 Biology: Spreading of diseases in a population
 Sociology: Emergence of different cultural habits
 Web Science:
• Spreading of memes / hashtags in Twitter
• Emergence of a collaborative vocabulary in tagging systems
• …
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
5 of 30
Basic Models (I)
 Preferential Attachment (Polya Urn Model)
 There are n balls with different colors in an urn
 In each step:
• Randomly draw a ball
• Put it back together with a second ball of the same color
 Fixed number of colors
 Colors are distributed according to a power law
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
6 of 30
Basic Models (II)
 Linear Preferential Attachment (Simon Model)
 Like the Polya Urn Model. Additionally in each step:
• Instead of drawing a ball, insert with low probability p a ball
with a new color
 Linear increasing number of colors
 Colors are distributed according to a power law
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
7 of 30
Basic Models (III)
 Information Cascades
 Users decide rationally between alternatives
• Example: Accept (A) / Reject (R)
 Each user gets private information
• When the correct decision is to accept, the user more likely
gets the information to accept (i.e. P(A) > 0.5)
 Each user sees the decision of the previous users
 Rational choice:
• Adopt the choice of the majority of previous users and
private information
Choice only relies on decision of previous users, if the
difference in votes between A and R increases beyond 2
All subsequent users adopt the same choice  cascade
Not necessarily the correct decision is cascaded!!!
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
8 of 30
Method of Model-based Research
Reality
Micro-interactions
Unknown Model
Macro-observations
Observed Properties
Compare
Model
Unknown rules of interaction
Stochastic Model
Simulated Properties
Assumed rules of interaction
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
9 of 30
Use Case: Spreading of Memes in Twitter (I)
 Meme: Topic / idea that is discussed in Twitter
 Observables:
 Lifetime of tweets in Twitter (in hours)
 Number of people contributing to a meme (per day)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
10 of 30
Use Case: Spreading of Memes (II)
 Assumed rules of interaction:
 Each user can see memes posted by his friends
 Each user remembers his own previously tweeted memes
 When tweeting, a user either …
• … invents a new meme, or …
• … randomly selects a meme posted by his friends, or …
• … randomly picks up one of his previously tweeted memes
 Users only remember the last n tweets of their friends
and/or of their own
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
11 of 30
Use Case: Spreading of Memes (III)
 Comparing simulation and reality:
 Empirical observations are better reproduced when
assuming a social network between users
 Structure of the friendship network influences meme spreading
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
12 of 30
Details of Model-based Research
 How to represent observables?
 Distribution functions
 How to compare simulation and reality?
 Analytical evaluation
 Visual comparison
 Goodness-of-fit tests
 How to decide between competing models?
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
13 of 30
Method of Model-based Research
Reality
Micro-interactions
Unknown Model
Macro-observations
Observed Properties
Compare
Model
Unknown rules of interaction
Stochastic Model
Simulated Properties
Assumed rules of interaction
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
14 of 30
Use Case: Dynamics in Tagging Systems
 Do the users agree on how to describe a resource?
 How do users influence each other in tagging systems?
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
15 of 30
Folksonomies
 Vertexes: Users, tags, resources
 Hyperedges: Tag assignments (user X tag X resource)
 Postings:
 Tag assignments of a user to a single resource
 Can be ordered according to their time-stamp
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
16 of 30
Co-occurrence Streams
 Co-occurrence Streams:
 All tags co-occurring with a given tag in a posting
 Ordered by posting time
 Example tag assignments for ‘ajax':
 {mackz, r1, {ajax, javascript}, 13:25}
 {klaasd, r2, {ajax, rss, web2.0}, 13:26}
 {mackz, r2, {ajax, php, javascript}, 13:27}
 Resulting co-occurrence stream:
javascript rss web2.0 php javascript
time
WeST
Tag
|Y|
|U|
|T|
|R|
ajax
blog
xml
2.949.614
6.098.471
974.866
88.526
158.578
44.326
41.898
186.043
31.998
71.525
557.017
61.843
Klaas Dellschaft
[email protected]
Introduction to Web Science
17 of 30
Co-occurrence Streams – Tag Frequencies
Zipf Plot of
the tag
frequencies
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
18 of 30
Probability Distributions
 Measuring the probability of a certain event
 Examples:
 Rolling a dice – How often do we get the 1, 2, 3, …?
 Questionnaires – How often do people check the 1, 2, …
on a scale from 1 to 10?
 Tagging – How often is the tag ‘ajax’ used?
 Tagging – How many of the used tags are used 1-time,
2-times, …?
 Different types of measurement scales
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
19 of 30
Probability Distributions – Measurement Scales (I)
Nominal scale
Ordinal scale
Interval scale
Source: http://de.wikipedia.org/wiki/Skalenniveau
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
20 of 30
Ratio scale
Probability Distributions – Measurement Scales (II)
Nominal Scale
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
blog
health
food
nutrition
eating
cooking
Probability of Tags with Frequency x
Ordinal Scale /
Interval Scale
0,6
0,5
0,4
0,3
0,2
0,1
0
1
2
3
4
5
6
Tag Frequency
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
21 of 30
7
8
Probability Distributions – Representations (I)
 Probability Distribution Function (PDF):
 P(X = x): Probability of observing an event x
 Cumulative Distribution Function (CDF):
 P(X  x): Probability of observing an event whose
value is  x. Requires at least ordinal measurement scale.
 Example: Normal distribution
PDF
CDF
Source: http://en.wikipedia.org/wiki/Normal_distribution
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
22 of 30
Probability Distributions – Representations (II)
 Zipf plot
 Representation for distributions with nominal scale
 Assign ranks to the different categories
• Rank 1: Most often occurring category
 x-axis: Categories ordered by their ranks
 y-axis: Probability of category with rank x
 Often used for representing word frequencies in texts
 Zipfs law:
 Describes the relation between the rank k and the
frequency f(k) of a word in natural language texts
f (k; s)  k s , s  0
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
23 of 30
Co-occurrence Streams – Tag Frequencies
Tag frequencies approx.
follow Zipf’s law (straight
line in Zipf plot with logarithmically scaled axes)
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
24 of 30
Method of Model-based Research
Reality
Micro-interactions
Unknown Model
Macro-observations
Observed Properties
Compare
Model
Unknown rules of interaction
Stochastic Model
Simulated Properties
Assumed rules of interaction
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
25 of 30
Comparing Reality and Model (I)
 Visual comparison:
 Visually plot the real observables and the simulated results
 The closer together the plots, the better the model
 Advantage: Easy to understand and to implement
 Disadvantage: Highly subjective (i.e. not a scientific
method)
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
26 of 30
Comparing Model and Reality (II)
 Analytical evaluation:
 Use mathematical methods for analyzing the model
 Proof that the simulation results have certain properties
 Example: Preferential attachment
• Frequency distribution of colors is a power-law
• Color frequencies tend to a random limit
 Advantages:
 Very deep understanding of the mechanisms
 Mathematical dependencies between model parameters and
properties of the simulation results
 Disadvantages:
 Analyzed models have to be “mathematically tractable”
 Does not show that simulated properties can also be observed in
reality
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
27 of 30
Comparing Model and Reality (III)
 Goodness-of-fit tests:
 First step:
• Define objective measure of distance between simulated and
observed property
Relative measure of goodness-of-fit
Applicable for any property
 Second step:
• Computer whether simulated and observed property are
statistically indistinguishable
Absolute measure of goodness-of-fit
Only applicable for properties that can be represented as
probability distributions
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
28 of 30
Kolmogorov-Smirnov Test (Example)
 Goodness-of-fit test for distributions with at least ordinal measurement scale
 Maximal distance between simulation and observation: D  max | S1 ( x)  S 2 ( x) |
 x
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
29 of 30
Details of Model-based Research
 How to represent observables?
 Distribution functions

 How to compare simulation and reality?
 Analytical evaluation
 Visual comparison
 Goodness-of-fit tests

 How to decide between competing models?
WeST
Klaas Dellschaft
[email protected]
Introduction to Web Science
30 of 30
Friday!