Crowdsourcing

Download Report

Transcript Crowdsourcing

dsourcing
Ling 240

What is crowdfunding?
Crowdsourcing—definition
“the practice of obtaining information or services by soliciti
Examples:
• Wikipedia
• Google Translate
• FamilySearch Indexing
COCA's registers based on publication
Crowdsourcing
• What are the benefits of collecting data through crowdsou
• What are the limitations/weaknesses?
• What can be done to ensure that crowdsourcing workers a
Crowdsourcing in linguistics
• Wilhelm Kaeding (1897)
• Thousands of non-experts helped compile and analyze an 11 millio
• Oxford English Dictionary (1858 – 1928)
• Hundreds of non-expert readers submitted 6 million quotation slip
• Perceptual dialectology
• Dialect perceptions elicited from non-experts
Mechanical Turk (Amazon)
• Strengths
•
•
•
•
Inexpensive
Fast
Quality control
Access to thousands of people
• Growing body of research strongly supports the quality of
• E.g., Buhmester et al., 2011; Kittur et al., 2008; Suri & Watts, 2011; Urbano et a
Case study--
Register classification
• Traditional ‘user’-based approach
• ‘Expert’ classifies texts into registers by simply sampling from the
• Limitations
• ‘Publication type’ is not a meaningful criterion for web documents
• Experts can’t agree on register category for internet texts
Corpus
• Extracted from the Corpus of Global Web-based English (GloWbE),
• (Near) random sampling methods used to build the corpus
• Google searches of highly frequent English 3-grams (e.g., is not the, and from the) used
• 800-1000 links for each n-gram (i.e., 80-100 Google results pages)
• Davies randomly extracted c. 49,300 URLs from GloWbE
• Only web pages from USA, UK, Canada, Aus., and NZ
• Documents < 75 words were excluded
• Non-textual material was removed from all web pages (HTML scrubbing and boilerplate
• 1,445 URLs were excluded from subsequent analysis because they c
• Final corpus for the study: 48,555 web documents.
People asked to determine mode of passage, then participan
Crowdsourcing end-user data: Classification
• Developed a computer-adaptive survey for register classif
• Tested the tool through 10 rounds of piloting, resulting in
• Recruited 908 raters through Mechanical Turk
• 6 responses x 4 raters x 49,300 texts = 1.2 million individu
Agreement results for the general register classificat
(Fleiss’ Kappa = .47, moderate agreement)
4 agree
3 agree
2-2 split
17,511
36.4%
15,684
32.6%
5,682
11.8%
2-1-1
split
8,515
17.7%
No
agreement
755
1.6%
• 69% of documents achieved majority agreement
• Additional 11.8% are potential 2-way hybrids
Frequencies of general register categories
(i.e., documents where 3 or 4 raters were in agreement)
Systematic patterns of disagreement
• 28 different 2-2 combinations are possible in theory
• But, only 7 of those combinations occurred > 100 times in o
• Because these are widely attested user-based patterns, we
Frequencies of 2-way hybrids that occu
Multi-Dimensional analysis
• Factor analysis to identify dimensions based on co-occurre
• Interpret dimensions functionally
• Calculate scores for each text on each dimension
17
Features used by Biber adopted:
Positive features:
Verbs: present tense verbs, mental verbs, do as pro-verb, be as main ver
Pronouns: 1st person pronouns, 2nd person pronouns, it, demonstrative
Adverbs: general emphatics, hedges, amplifiers
Dependent clauses: that complement clauses (with that deletion), caus
Other: contractions, analytic negation, discourse particles, sentence rel
==================================
Negative features:
Nouns, long words, prepositional phrases, attributive adjectives, lexical d
The results
• Linguistic (use-based) variation across user-based register
Web registers along Dimension 1
Web registers along Dimension 1
What have we learned?
• Non-expert users can reliably classify web documents
• At least 1 in 10 internet texts belongs to a hybrid register c
• Publication type ≠ register (at least for the web)
• E.g., blogs showed up in several register categories
• Triangulating end-user classifications with linguistic analys
research: Next steps
• Comprehensive linguistic description of the patterns of registe
• A new multi-dimensional analysis of web registers
• Detailed linguistic descriptions of ‘unique’ web registers
• Automatic prediction of register (‘AGI’)
• Automatically coded large corpus of web documents
• Extend descriptions to include ‘private’ web registers
Areas for future user-based research
• Register classification of printed texts
• Reader/listener perceptions
• Corpus annotation
• Word sense disambiguation
5. The future of crowdsourcing in user
• User-based analyses have always happened; now we can d
• Triangulating use-based linguistic data offers a more comp
• Linguists are often unable to fully analyze and interpret pa
• Harnessing the power of user-based data via crowdsourcin
Mechanical Turk
• The name comes from an 18th century machine that playe
• A person actually hid inside and played
Mechanical Turk
• Amazon's Mechanical Turk is a crowdsourcing tool.
• Researchers who need human evaluation can get data
• People who want to make some money help with the proj
– Image recognition
– Speech processing
– Subjective evaluation
– Giving opinions
– Tagging corpora
– Match picture with product
Mechanical Turk
• Example: word sense disambiguation in corpora
– What should head be tagged as? Noun or verb?
– What does head mean in a sentence?
• They charged the head of finances with the crime. (
• The beer was flat with no head. (froth)
• They were going head first (manner of movement)
• Computers can't do it well but people can
How does it work?
Couldn't people cheat?




After reviewing results the requester can
reject a worker
When rejected, they don't get paid
Workers have approval rates
Requesters can choose only workers with
good rates
Advantages




Thousands of potential workers available
You can get results fast
Demographic variety (not just undergrads)
Cheap (average $1.40 per hour)
Disadvantages

Cheating


Some studies show it's at same rates as in
lab
Ways to test

“While exercising how often have you had a
fatal heart attack?”

It requires money

Can't do many types of experiments (RT)
Go look at it
Mechanical Turk website