laweb07 - Universidade Federal de Minas Gerais

Download Report

Transcript laweb07 - Universidade Federal de Minas Gerais

A Geographical Characterization of
YouTube: a Latin American View
Fernando Duarte, Fabrício Benevenuto,
Virgílio Almeida, Jussara Almeida
Federal University of Minas Gerais – Brazil
Outline
•
•
•
•
•
Motivation and Goals
YouTube Features
Crawler and Sampling
Geographical Characterization
Conclusions and Future Work
Motivation and Goals
• YouTube is a popular online social video sharing
service which generates high-volumes of Internet
traffic
• YouTube Popularity in Latin American (from
www.alexa.com)
– 6th in Argentina and Paraguay,
– 5th in Brazil, Mexico, Chile and Peru,
– 4th in Ecuador and Venezuela.
• Goal: characterize influence of geographical
localization of users on traffic and social
relationship.
– Focus on Latin American
YouTube Features
Users – videos
•
•
•
•
•
watch videos
upload videos (unlimited)
add videos as favorite
post a comment to a video
respond a video with
another video
• rating a video
Videos
Users – users interactions
• add users as friends
• subscribe to another user
• Have a list of 20
related videos
• are distributed in 14
categories
Sampling Mechanism
• Sampling Strategy:
– collect information of popular videos and analyze the user
interactions around these videos.
• First crawler: Collect metadata information of
Videos
– Start from top all time viewed video and collect the
related videos recursively in snowball fashion
• Snowball uses a Breadth first scheme
• Second Crawler: Collect metadata information of
Users from the first crawler
– User who uploaded videos, posted comments or video
responses.
Crawler Architecture
• Parallel crawler
– Server coordinates the
snowball sampling and
– Server avoids redundant
data collection
– 7 Linux boxes
…
Client 2
Client 7
Client 1
Server
• Collected information of over 2 million videos, exhausting 6 tiers in 11
days (from Apr 3rd to 14th)
• 96 of the 100 most all-time popular videos are part of the sample
Statistics of videos and users collected
•
•
•
•
USA is responsible for 28% of videos and 38% of users
7% of users are LA, responsible for 7% of uploads and 6% of views
13% of users without country information (empty)
# views > # comments > # video responses
Latin American Users
• Table is sorted by number of users
• Users from Brazil, Mexico, and Argentina have contributed with more
videos, but in terms of uploads/user Peru leads the rank
• In terms of traffic (wached videos) Brazil, Mexico, and Virgin Islands
lead the rank
• LA users have an average 22 favorite videos and average of 2 friends
– Orkut and Myspace have an average of 30 and 137 friends respectively
• We guess that most part of the users interact with friends in other
online social network and use YouTube essentially to watch videos
Video Popularity
• Curve of number of views does not descend linearly
• 10% of the top popular LA videos concentrate 76% of the views: caching
• LA videos are less visualized and discussed, generating less traffic than
other videos
Video Duration
• About 80% of the videos are smaller than 5 minutes
• There is no difference for different regions
Use of Social Features
• LA users interact less at YouTube than other users.
Use of Social Features
• Besides less interactive, there are LA users with 2400 friends, users
who uploaded 1400 videos and sent more than 1200 comments.
User Interactions
Latin American videos
• Observe the percentage of comments for videos from LA,
USA and others.
• Plot the distribution of this percentage
Textual Interactions
Latin American videos
USA videos
• The probability of LA videos have more than 60% of comments from LA
users is 0.32 (from USA is only 0.08)
• Videos have higher probability to receive comments from same region
• Potential use of CDNs (assuming that number of views is also influenced
by geographical factors)
• Few LA users interact with videos from USA/others, but USA/others
interact with LA users
Conclusions and Future Work
• We present a geographical characterization of YouTube,
highlighting a number of differences between Latin
American users and other countries
• Main Findings
– Videos uploaded by LA users present different characteristics than
videos uploads by users from other regions: less visualized and
discussed.
– Top popular videos concentrate most part of the views, suggesting
the use of caching
– Interactions present strong influence of geographical localization,
suggesting the use of CDNs to improve performance
• Future Work
– Analyzing impact of language on traffic and user behavior
– Explore social networks characteristics of interactions between
users and videos across different regions
Questions?
[email protected]