Human Computation and Crowdsroucing
Download
Report
Transcript Human Computation and Crowdsroucing
Human Computation and Crowdsourcing
Uichin Lee
KSE652 Social Computing Systems
Design and Analysis
The Rise of Crowdsourcing
By Jeff Howe (Wired Magazine, 2006)
Remember outsourcing? Sending jobs
to India and China is so 2003.
The new pool of cheap labor:
everyday people using their spare
cycles to create content, solve
problems, even do corporate R&D.
The Rise of Crowdsourcing
•
•
•
•
•
The Professional
The Packager
The Tinkerer
The Masses
And the Age of the Crowd
The Professional
• A story of “Claudia Menashe”
– A project director at the National
Health Museum in Washington,
DC
– Putting a series of interactive
kiosks devoted to potential
pandemics like the avian flu
– An exhibition designer created a
plan for the kiosk; now she wants
to have images to accompany the
text..
• Hire a photographer?
• Pre-existing images—stock
photograph
The Professional
• She ran across a stock photo collection by
Mark
We don’t have much money…
I’ll give you some
discount: how about
$100-$150 per
photograph?
That’s about half of what
a cooperate client would
pay!
Claudia
Mark
Great! I’ll buy 4 images!!
The Professional
One dollar!! That’s a steal!!
iStockphoto: a marketplace for the work of amateur photographers (e.g., homemakers,
students, engineers, dancers); over 20,000 contributors which charge about $1 to $5 per
basic image
The Packager
• Viral videos; how to repurpose content to make compelling
TV on a budget?
• Web Junk 20 at VH1: American television program in which
VH1 and iFilm collaborate to highlight the twenty funniest
and most interesting clips collected from the Internet that
week
• Michael Hirschorn (creator of Web Junk 20)
– “I knew we offered something YouTube couldn’t; television.
Everyone wants to be on TV”
• Next generation TV: user generated content
– As user generated TV matures, the users will become more
proficient and the networks better at ferreting out the best of
the best..
• UGC everywhere; say in education, e.g., Khan Academy
The Tinkerer:
The Future of Cooperate R&D
• InnoCentives
– Launched in 2001 to connect with brainpower outside the company
– Companies pay solvers anywhere from $10,000 to $100,000 per solution
– Jill Panetta (CSO) says
• More than 30% of the problems are solved!
• The odds of a solver’s success increased in the fields in which they had no formal
expertise
• “The strength of weak ties”– Mark Granovetter
• Similar Services:
• Ed Melcarek
– On most Saturdays, Melcarek attacks problems that have stumped some of
the best cooperate scientists at Fortune 100 companies
– “not bad for a few weeks’ work” (e.g., Colgate problem: $25,000)
• P&G’s R&D:
– “We have 9,000 people on our R&D staff and up to 1.5 million researchers
working through our external networks”
The Masses
• The Turk:
– The first machine capable of beating a human
at chess, built around the late 1760s by a
Hungarian nobleman named Wolfgan von
Kempelen
• Amazon’s Mechanical Turk
– Crowdsourcing for the masse (no specific
talents required)
– Web based marketplace that helps companies
to find people to perform “human intelligence
tasks” (HITs) computers are lousy at
– Examples: identifying items in a photo,
skimming real estate documents to find
identifying information, writing short product
description
– HITs cost from a few cents to a few dollars or
more
“Human Intelligence inside”
Our focus : The Masses – Labor Marketplaces, Games, Ubiquitous
Sensing, Social Networking/Q&A
The Age of the Crowd
• Distributed computing projects: UC Berkeley’s SETI@home?
– Tapping into the unused processing power of millions of individual
computers
• “Distributed labor networks”
– Using the Internet (and Web 2.0) to exploit the spare processing
power of millions of human brains
• Successful examples?
– Open source software: a network of passionate, geeky volunteers
could write code just as well as highly paid developers at Microsoft or
Sun Microsystems
– Wikipedia: creating a sprawling and surprisingly comprehensive online
encyclopedia
– eBay, Facebook: can’t exist without the contributions of users
The Age of the Crowd
• The productive potential of millions of
plugged-in enthusiasts is attracting the
attention of old-line business too
• For the last decade or so, companies have
been looking overseas for cheap labor
• But now it doesn’t matter where the laborers
are, as long as they are connected to the
Internet
The Age of the Crowd
• Technological advances in everything (from
product design software to digital video cameras)
are breaking down the cost barriers that once
separated amateurs from professionals
• Crowds (e.g., hobbyists, part-timers, dabblers)
now suddenly have a market for their efforts
• Smart companies in industries tap the latent
talent of the crowd
“The labor isn’t always free, but it costs a lot less than paying
traditional employees. It’s not outsourcing: it’s crowdsourcing”
Human Computation: A Survey
and Taxonomy of a Growing Field
Alexander J. Quinn, Benjamin B. Bederson
CHI 2011
Human Computation
• Computer scientists (in the artificial intelligence field)
have been trying to emulate human like abilities, e.g.,
language, visual processing, reasoning using computers
• Alan Turing wrote in 1950:
“The idea behind digital computers may be explained by
saying that these machines are intended to carry out any
operations which could be done by a human computer.”
• L. Von Ahn 2005 wrote a doctorial thesis about human
computation
• The field is now thriving: business, art, R&D, HCI,
databases, artificial intelligence, etc.
Definition of Human Computation
• Dates back 1938 in philosophy and psychology
literature; 1960 in Computer Science literature
(by Turing)
• Modern usage inspired by von Ahn’s 2005
dissertation titled by “Human Computation”
– “…a paradigm for utilizing human processing
power to solve problems that computers cannot
yet solve.”
Definition of Human Computation
• “…the idea of using human effort to perform tasks that computers cannot
yet perform, usually in an enjoyable manner.” (Law, von Ahn 2009)
• “…a new research area that studies the process of channeling the vast
internet population to perform tasks or provide data towards solving
difficult problems that no known efficient computer algorithms can yet
solve” (Chandrasekar, et al., 2010)
• “…a technique that makes use of human abilities for computation to solve
problems.” (Yuen, Chen, King, 2009)
• “…a technique to let humans solve tasks, which cannot be solved by
computers.” (Schall, Truong, Dustdar, 2008)
• “A computational process that involves humans in certain steps…” (Yang,
et al., 2008)
• “…systems of computers and large numbers of humans that work together
in order to solve problems that could not be solved by either computers or
humans alone” (Quinn, Bederson, 2009)
• “…a new area of research that studies how to build systems, such as
simple casual games, to collect annotations from human users.” (Law, et
al., 2009)
Related Ideas
•
•
•
•
Crowdsourcing
Social computing
Data mining
Collective intelligence
Crowdsourcing
• “Crowdsourcing is the act of taking a job traditionally performed by
a designated agent (usually an employee) and outsourcing it to an
undefined, generally large group of people in the form of an open
call.” (Jeff Howe)
• Human computation replaces computers with humans, whereas
crowdsourcing replaces traditional human workers with members
of the public
– HC: replacement of computers with humans
– CS: replacement of insourced workers with crowdsourced workers
• Some crowdsourcing tasks can be considered as human
computation tasks
– Hiring crowdsourced workers for translation jobs :
– Machine translation (fast, but low quality) vs. human translation (slow,
high quality)
Social Computing
• Definition from Wikipedia:
– “.. supporting any sort of social behavior in or through
computational systems” (e.g., blogs, email, IM, SNS, wikis, social
bookmarking)
– “.. Supporting computations that are carried out by groups of
people” (e.g., collaborative filtering, online auctions, prediction
markets, reputation systems)
• Some other definitions:
– “… applications and services that facilitate collective action and
social interaction online with rich exchange of multimedia
information and evolution of aggregate knowledge…”
(Parameswaran, Whinston, 2007)
– “… the interplay between persons' social behaviors and their
interactions with computing technologies” (Dryer, Eisbach, Ark,
1999)
Data Mining
• Data mining is defined broadly as the
application of specific algorithms for
extracting patterns from data.” (Fayyad,
Piatetsky-Shapiro, Smyth, 1996)
• While data mining deals with human created
data, it does not involve human computation
– Google PageRank “only” uses human created data
(links)
Collective Intelligence
• Overarching notion: large groups of loosely
organized people can accomplish great things by
working together
– Traditional study focused on “decision making
capabilities by a large group of people”
• Taxonomical “genome” of collective intelligence
– “… groups of individuals doing things collectively that
seem intelligent” (Malone, 2009)
• Collective intelligence generally encompasses
human computation and social computing
Relationship Diagram
Human
Computation
Crowdsourcing
Social
Computing
Collective
Intelligence
Data
Mining
Classifying Human Computation
• Motivation
– What does motivate people to perform HC?
• Human skill
– What kinds of human skills do HC tasks require?
• Aggregation
– How to combine results of HC tasks?
• Quality control
– How to control quality of the results of HC tasks?
• Processing order of different roles
– Roles (requester, worker, computer)
• Task-request cardinality
– Requester vs. Worker cardinality
Motivation
Motivation
Examples
Pay (financial rewards)
Mechanical Turk (online labor marketplace), ChaCha
(mobile Q&A), LiveOps (a distributed call center)
Altruism (just helping other people for good)
helpfindjim.com (Jim Gray), Naver KiN, Yahoo! Answer
Enjoyment (fun)
Game With A Purpose (GWAP): http://www.gwap.com
- ESP Game, Tag a Tune,
Reputation (recognition)
Volunteer translators at childrenslibrary.org , Naver KiN,
Yahoo! Answer
Implicit work
reCAPTCHA
Quality Control
Quality Control
Examples
Output agreement
ESP Game (a game for labeling images) – answer is accepted if the pair agree on
the same answer
Input agreement
Tag-a-tune: two humans are listening to different inputs (music). They are asked
to describe the music and try to decide whether they are looking at the same
music or different music
Economic models
When money is a motivating factor; some economic models can be used to elicit
quality answers (e.g., game-theoretic model of the worker’s rating to reduce the
incentive to cheat)
Defensive task design
Design tasks so that it’s difficult to cheat (e.g., comprehension questions)
Redundancy
Each task is given to multiple people to separate the wheat from the chaff
Statistical filtering
Filter or aggregate the data in some way that removes the effects of irrelevant
work
Multilevel review
One set of workers does the work; the second set reviews the results and rates
the quality (e.g., Soylent : find-fix-verity)
Automatic check
fold.it (protein folding game); easy to check using computer, but hard to find
answers
Reputation system
Expert check
Motivated to provide quality answers by a reputation scoring systems;
Mechanical Turk, Naver KiN, etc.
Trusted expert skims or cross-checks results for relevance and apparent accuracy
Aggregation
Aggregation
Examples
Collection (to build a
knowledge base)
Artificial intelligence research; to build large DB of common sense facts
(e.g., people can’t brush their hairs with a table)
Examples: ESP game, reCAPTCHA, FACTory, Verbosity, etc.
Wisdom of crowds (statistical
processing of data)
Average guess of normal people can be very close to the actual outcome;
e.g., Ask500people, News Futures, Iowa Electronic Markets
Search
Large number of volunteers to sift through photos or videos, searching for
some desired scientific phenomenon, person, or object, e.g.,
helpfindjim.com, Stardust@home project
Iterative improvement
Giving answers of previous worker to elicit better answers, e.g., MonoTrans
Active learning
Classifier training; selects the samples that could potentially give best
training benefits and select them for manual annotations for training
Genetic algorithm
(search/optimization)
Free Knowledge Exchange, PicBreeder
None (if independent task is
performed)
VizWiz (a mobile app that les a blind user take a photo and ask question)
Human Skills, Processing Order,
Task-Request Cardinality
Human Skills
Examples
Visual recognition
ESP Game
Language understanding
Soylent
Basic human communication
ChaCha
Processing Order
Examples
Computer Worker (>> Requester)
reCAPTCHA
Worker (player) Requester Computer (aggregation)
ESP Game (image labeling)
Computer Worker Requester Computer
Cyc inferred large # of common senses
FACTory, a GWAP where worker (players solve
problem) , Cyc performs aggregation
Requester Worker
Mechanical Turk
Task-Request Cardinality
Examples
One-to-one (one worker to one task)
ChaCha
Many-to-many (many workers to many tasks)
ESP Game
Many-to-one (many workers to one task)
helpfindjim.com (Jim Gary)
Few-to-one (few workers to one task)
VizWiz
Summary
• Definition of human computation and
crowdsourcing
• Relationship with other related issues
• Classifying human computation and
crowdsourcing systems
– Motivation, human skill, aggregation, quality
control, processing order, task-request cardinality
– Nature of collaboration, architecture, recruitment,
human skill