Preferential Behavior in Online Groups

Download Report

Transcript Preferential Behavior in Online Groups

Preferential Behavior in
Online Groups
Lars Backstrom, Ravi Kumar, Cameron
Marlow, Jasmine Novak, Andrew Tomkins
February 11, 2008
WSDM
1
Power Users
February 11, 2008
WSDM
2
Executive Summary of Preferential Treatment
Long term power-users are:
1) 20 times more likely to receive a response upon joining
2) Twice as likely to receive a response upon becoming
heavily engaged
3) 9 times more likely to have early responses come from
other power-users
February 11, 2008
WSDM
3
Outline
• Introduction to online groups
• Experimental set-up
• Statistics of group “cores”
• Statistics of heavily engaged users
• Preferential treatment of engaged users
• Model: Predicting deep engagement
February 11, 2008
WSDM
4
Outline
• Introduction to online groups
• Experimental set-up
• Statistics of group “cores”
• Statistics of heavily engaged users
• Preferential treatment of engaged users
• Model: Predicting deep engagement
February 11, 2008
WSDM
5
Online Groups
• A majority of internet users participate in some form of
“online group” related to hobbies, beliefs or offline
relationships (Pew 2001)
• Groups vary along a number of dimensions:
–
–
–
–
Scale
Online vs. offline relationships
Broadcast, Q/A, and interaction
etc.
• Examples
– Ithaca Rotary Club mailing list
– Palo Alto Parenting group
– Aerosmith fan club on MySpace
February 11, 2008
WSDM
6
Yahoo Groups
• 100 million users, 6 million groups
• Can be created by any user. This user becomes the
moderator of the group
– controls privacy settings, access, memberships, etc.
• Content includes information pages, multimedia content,
and message boards.
• Majority of contents resides in message boards.
– Members may post a message on a new topic, or respond to a
message posted earlier
– Users may read content online, or receive by email
– ~6 million groups, ~6 billion messages
• We used data from one year: May 2005 - May 2006
February 11, 2008
WSDM
7
Privacy and Size
• Analysis performed on several categories of groups
• Size
– Small: fewer than 20 unique posters
– Medium: 20-99 unique posters
– Large: greater than 100 posters
• Privacy
– Public: open and listed or open and unlisted
– Semi-public: restricted and listed
– Private: closed and listed, closed and unlisted or restricted and
unlisted
February 11, 2008
WSDM
8
Yahoo Groups
February 11, 2008
WSDM
9
Outline
• Introduction to online groups
• Experimental set-up
• Statistics of group “cores”
• Statistics of heavily engaged users
• Preferential treatment of engaged users
• Model: Predicting deep engagement
February 11, 2008
WSDM
10
Engaged Users & Thriving Groups
• Various degrees of user engagement.
– lurkers, to heavily engaged
• Our focus: users who are heavily engaged in the group,
with a high level of posting activity
• What differentiates these engaged users? Are they treated
differently? Do they behave differently?
• Look at “thriving” groups
February 11, 2008
WSDM
11
Thriving Groups
Three requirements to be a “thriving” group:
1) Baseline Users
– At least 10 users must post during the year
2) Baseline traffic
– at least two messages for every 30 day window.
3) Dense period
– a two-month period during which every 7-day interval has at least
10 posts
New corpus: 44,473 groups, 1M users
February 11, 2008
WSDM
12
k-Cores
• We define the k-core of a group at time t as follows:
• For a two week window around t, a user is in the k-core if:
– the user has replied to k distinct users in the group
– the user has been replied to by k distinct users
3-core user
February 11, 2008
WSDM
13
Outline
• Introduction to online groups
• Experimental set-up
• Statistics of group “cores”
• Statistics of heavily engaged users
• Preferential treatment of engaged users
• Model: Predicting deep engagement
February 11, 2008
WSDM
14
Core Size
48% of group/time
pairs have a 2-core of
at least 6 people
February 11, 2008
WSDM
15
Fraction of Posters in Core
1
0.9
0.8
0.7
0.6
Private
Semi-public
Public
0.5
0.4
0.3
0.2
0.1
0
Small
February 11, 2008
Medium
WSDM
Large
16
Time Spent in Core
large public
groups: 94%
in core less
than 2 weeks
small private
groups: 48% in
core for 200+ days
small-private
groups: 20% in
core less than
2 weeks
February 11, 2008
WSDM
17
Half-life of Cores
February 11, 2008
WSDM
18
Core Populations
• Light: briefly enters the
conversation, i.e., don’t
enter the core
• Short Core: enters the
core for less than 50 days
Long
core
90k
Short core
134k
Light 774k
• Long Core: enters the
core for 50 days or more
February 11, 2008
WSDM
19
Outline
• Introduction to online groups
• Experimental set-up
• Statistics of group “cores”
• Statistics of heavily engaged users
• Preferential treatment of engaged users
• Model: Predicting deep engagement
February 11, 2008
WSDM
20
Long Core Users Across Groups
(6, 0.55) = long-core in
first 6 groups joined, 55%
probability of being
long-core in the 7th
February 11, 2008
WSDM
21
Multiple Memberships
February 11, 2008
WSDM
22
Outline
• Introduction to online groups
• Experimental set-up
• Statistics of group “cores”
• Statistics of heavily engaged users
• Preferential treatment of engaged users
• Model: Predicting deep engagement
February 11, 2008
WSDM
23
Preferential Treatment of Engaged Users
•
Are engaged, or “long-core”, users treated differently
within a group?
•
Yes! We detail three key forms of preferential treatment
given to heavily engaged users.
February 11, 2008
WSDM
24
Response to Newcomer
Private Small
Public Medium
Semi-Public Small
Private Large
Public Small
Semi-Public Large
Private Medium
Public Large
Semi-public Medium
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Light
February 11, 2008
Short-Core
WSDM
Long-core
25
Response to Core Members
0.35
0.3
0.25
0.2
Public
Semi-Public
Private
0.15
0.1
0.05
0
Long-core
February 11, 2008
Short-core
WSDM
26
Probability
Response Probability by Newcomer Type
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Long-core
Column 2
Light
Light
Short-core
Long-core
Newcomer Type
February 11, 2008
WSDM
27
Long-core Response Probability
0.34
0.33
0.32
Public
Semi-Public
Private
0.31
0.3
0.29
0.28
At time of
joining core
February 11, 2008
When in Core
50+ Days
WSDM
28
Summary of Preferential Treatment
Heavily engaged “long-core” users are:
1) 20 times more likely to receive a response upon joining
2) Twice as likely to receive a response upon becoming
heavily engaged
3) 9 times more likely to have early responses come from
other long-core users
Note: Probability of receiving a response increases until
joining the core, then begins to decline.
February 11, 2008
WSDM
29
First Post Types
100 first posts of long-core
users:
Friends 6%
Friends: the newcomer has
some prior relationship with
another group member
Introduction: the new poster
introduces herself to the
group
No decision
57%
Introduction
37%
No decision: no information
to determine a relationship
February 11, 2008
WSDM
30
Pregnancy-and-pups: Coccidiosis
I am new to this board and I have enjoyed reading the posts. I
am hoping you can help me learn more about coccidiosis.
I have a litter of puppies whose stool is good. They have
been on Albon for 10 days. One of the puppies went home
and to the vet today and has coccidiosis. Why is that? What
else can I use to be sure the puppies are free of cocci?
I do appreciate all your input and all your time in helping me!
February 11, 2008
WSDM
31
skatefans: Appropos of Barbara Cook
Hi! This is my first "Skatefans" post. (I've been reading -- don't
like the word "lurking" -- Skatefans since about the summer of
2000, but haven't had the time to post before.)
While everyone, of course, is entitled to their own likes and
dislikes, I'd just like to add some thoughts about "Fosse" from
someone who's been a very big fan of that program. I've only
been following figure skating since 1997 so my frame of
reference is obviously limited but, in terms of the exhibition
programs that I've seen duing this period, I think "Fosse" is
one of the "landmark" exhibition programs that I've seen
(although I can also see some of the problems with it that
people have been pointing out).
February 11, 2008
WSDM
32
Outline
• Introduction to online groups
• Experimental set-up
• Examination of group “cores”
• Statistics of heavily engaged users
• Preferential treatment of engaged users
• Model: Predicting deep engagement
February 11, 2008
WSDM
33
Modeling Long-Core Engagement
Factors at work creating long-core engagement:
• User factor: a user’s personality causes her to become
long-core in every group she joins
• Group factor: a group is so welcoming, or its topic so
engaging, that users are likely to become long-core
February 11, 2008
WSDM
34
Model
Users
Groups
Chance of being
long-core in a
group: 0.7
Chance of random
member being a
long-core user: 0.3
February 11, 2008
WSDM
35
Model
• For each (u,g) pair, predict whether pair is in set of
memberships H that are long-core.
• Pr[(u,g) in H] = 1 - (1-p(u))(1-p(g))
• Task is to choose the best p(u) and p(g) to reproduce H,
the set of long-core memberships
• Evaluate quality by the likelihood of predicting H
• Consider three variants:
– Use only properties of users, p(g) = 0
– Use only properties of groups, p(u) = 0
– Allow both p(u) and p(g) to be arbitrary
February 11, 2008
WSDM
36
Analytical Results
February 11, 2008
Model
% of correct edges
User-only p(g) = 0
94.9
Group-only p(u) = 0
85.6
Combined
95.1
WSDM
37
Improvement Using Group Factor
February 11, 2008
WSDM
38
Fin
• Social analysis of one of the world’s largest collections of
online communities
• Proposed a partitioning of the data to select for active
communities of engaged users
• Examined several levels of engagement: “light”, “shortcore”, and “long-core”
• Identified several striking ways in which heavily engaged
users are given preferential treatment from other members
of the group
• Proposed a model to study factors contributing to long-term
engagement and showed that both user and group factors
play a role.
February 11, 2008
WSDM
39
Fin++
Special thanks to the Groups team:
Di-fa Chang, Lee Clancy, David Kopp, Bobby Lee,
Maria Saltz and Gordon Strause.
Ravi Kumar, Jasmine Novak & Andrew Tomkins
{ravikuma,jnovak,atomkins}@yahoo-inc.com
Lars Backstrom [email protected]
Cameron Marlow [email protected]
February 11, 2008
WSDM
40