CSDL2014_1x - University of Alberta

Download Report

Transcript CSDL2014_1x - University of Alberta

HOWLING WOLVES AND ROARING LIONS:
WHAT SPEAKERS THINK AND WHAT A CORPUS TELLS US
John Newman & Tamara Sorenson Duncan
Department of Linguistics
University of Alberta
CSDL-12 Conference, Santa Barbara, 4-6 November 2014
Alternative empirical approaches to
understanding linguistic phenomena
• Different kinds of data, different kinds of analysis
• e.g., spoken corpus data & collostructional analysis (attraction of a
word to a construction); sentence completion task & response time
to recognition of a word in a construction
• Different kinds of data, same kind of analysis
• e.g., spoken corpus data, elicitation data (sentence completion
task), behavioral profile analysis
• Same data, different analyses
• e.g., one data frame analyzed by two or more different statistical
techniques
Cf. Gilquin, Gaëtanelle & Stefan Th. Gries. 2009. Corpora and experimental methods: A state-of-the-art
review. Corpus Linguistics and Literary Theory 5(1): 1-26.
Possible outcomes of different approaches to
studying linguistic phenomena
• Convergence of outcomes
• Divergence of outcomes
• Not clear (some results converge, some diverge)
This study: syntactic subject preference of
ROAR and HOWL
• sentence elicitation task
• adult corpora & measure association strength of subject
nouns associated with verbs
• adult corpora & behavioural profile analysis of factors for
verbs
• sentence elicitation task
• adult corpora & measure association strength of subject
nouns associated with verbs
• adult corpora & behavioural profile analysis of factors for
verbs
Elicitation Task
• “isolated sentence production”. 31 Research
participants saw a word appear on the screen and
were asked to provide a sentence using that word,
including ROAR and HOWL).
• 10 target verbs x 3 = 30 target stimuli
• 20 distractors x 2 = 40 distractor stimuli
• Only instances where the target word was used as
a verb in an active construction were included in
the analysis = 594 sentences (range 36-81
sentences/word).
Isolated sentence production vs
connected discourse
• less use of passive
• I/we more likely as subjects of sentences
• Tom1 noticed that he1.. anaphora more than Tom noticed
that Bob…
Roland, Douglas & Daniel Jurafsky. 2002. Verb sense and verb subcategorization
probabilities. In Suzanne Stevenson & Paolo Merlo (eds.), The Lexical Basis of
Sentence Processing: Formal, computational, and Experimental Issues, 325-346.
Amsterdam: John Benjamins.
Subjects of ROAR in Elicited Sentences
LION
CROWD
HE
CAT
CHILD
I
MONSTER
SIMBA
(YOU)
ANIMAL
DOG
DRAGON
IT
JET
LADY
NATALIE
PRINCIPAL
WAVE
WE
YOU
12
4
3
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
Subjects of HOWL in Elicited Sentences
WOLF
14
DOG
11
HE
6
I
6
COYOTE
4
WIND
4
WEREWOLF
3
(YOU)
2
YOU
2
ANIMAL
1
BEAR
1
MAN
1
PEOPLE
1
Non-subject references to [animal] nouns in Elicited
Sentences (not included in count of subject types)
• i became a wolf and howled at the moon.
AWOOOOOOOOOOOOOHH
• Can more animals roar than just a lion
• Natalie pretended to roar like an animal
• I'm a lion, hear me roar.
• The child likes to roar and pretend he's a lion.
Word associations with ROAR as prompt
(Edinburgh Association Thesaurus)
LION
NOISE
BULL
CROWD
LOUD
SHOUT
BELLOW
CAR
RAGE
TIGER
WATER
No. of
Percentage
responses of total responses
48
0.49
5
0.05
3
0.03
3
0.03
3
0.03
3
0.03
2
0.02
2
0.02
2
0.02
2
0.02
2
0.02
Showing items for responses >1, total no. of responses = 98
Word associations with HOWL as prompt
(Edinburgh Association Thesaurus)
WOLF
DOG
CRY
SHOUT
YELL
OWL
SCREAM
NOISE
GROAN
LAUGH
LAUGHTER
NIGHT
SHRIEK
No. of responses
18
13
11
7
7
5
5
3
2
2
2
2
2
Percentage
of total
responses
0.18
0.13
0.11
0.07
0.07
0.05
0.05
0.03
0.02
0.02
0.02
0.02
0.02
Showing items for responses >1, total no. of responses = 100
Learning about roaring lions
• Amazon CHILDREN’S BOOKS category search on
“lion+roar”
Baby-2year old:
Lion Cub Roars - (Baby Animals Book)
I Can Roar Like a Lion
3-5 year old:
When Lions Roar
Googly Eyes: Leo Lion's Noisy Roar!
Simon Says Roar like a Lion
The Happy Lion Roars
The Lion Who Couldn't Roar
Can You Roar Like a Lion? (Pull-N-Slide Books)
etc etc etc
Learning about roaring lions
Pop culture
• sentence elicitation task
• adult corpora & measure association strength of subject
nouns associated with verbs
• adult corpora & behavioural profile analysis of factors for
verbs
Who (or what) roars and howls in COCA?
proxy subject of
a ROAR verb
COCA subjects of ROAR by frequency
ALL
[CROWD]
[ENGINE]
[CAR]
[FIRE]
[WIND]
[LION]
[TRUCK]
[TRAIN]
[AUDIENCE]
[FLAME]
[PLANE]
[HEAD]
[HURRICANE]
Freq
183
158
116
84
71
51
46
43
40
36
35
33
33
SPOKEN
[HURRICANE]
[ECONOMY]
[KATRINA]
[FIRE]
[FLAME]
[CROWD]
[MOUSE]
[MARKET]
[LION]
[WIND]
[WALL]
[TRAIN]
[STOCK]
Freq
16
11
11
8
7
7
6
6
6
6
5
5
5
WRITTEN
[CROWD]
[ENGINE]
[CAR]
[FIRE]
[WIND]
[TRUCK]
[LION]
[AUDIENCE]
[TRAIN]
[PLANE]
[HEAD]
Freq
177
155
112
76
67
45
45
40
39
33
33
COCA subjects of ROAR by frequency
NEWSPAPER Freq
[CROWD]
[ENGINE]
[AUDIENCE]
[FIRE]
[PLANE]
[HURRICANE]
[JET]
[MARKET]
[MOTORCYCLE]
[STOCK]
[HELICOPTER]
[FAN]
47
13
11
10
10
9
9
9
7
7
7
7
FICTION
[ENGINE]
[CAR]
[CROWD]
[FIRE]
[WIND]
[TRUCK]
[HEAD]
[TRAIN]
[MAN]
[MOTORCYCLE]
[FLAME]
ACADEMIC
Freq
122
98
75
49
40
33
27
25
21
21
21
Freq
[CROWD]
8
[LION]
5
MAGAZINE
[CROWD]
[WIND]
[LION]
[ENGINE]
[FIRE]
[AUDIENCE]
[ECONOMY]
[CAR]
[MARKET]
[TORNADO]
[TRAIN]
[TRUCK]
Freq
47
23
19
17
14
13
11
8
8
7
7
7
Collostructional Analysis
• Use Coll.analysis 3.2a script by Stefan Gries
• Total no. of constructions in corpus = [vv*] in corpus
• Total no. of [SUBJ ROAR] constructions = [n*] in L3-L1 of
[vv*], grouped as lemmas (sg + pl)
• coll.strength: -log10 (Fisher-Yates exact, one-tailed), the
higher, the stronger
COCA subjects of ROAR by Coll.analysis
ALL
Coll. Score
[CROWD]
226.4
[ENGINE]
213.3
[LION]
63.8
[CAR]
60.7
[WIND]
50.9
[MOTORCYCLE]
47.7
[FIRE]
47.7
[FLAME]
39.9
[TRUCK]
33.4
[HURRICANE]
33.3
[HELICOPTER]
SPOKEN
[HURRICANE]
[KATRINA]
[FLAME]
[LION]
[MOUSE]
[ECONOMY]
[CROWD]
[FIRE]
[STOCK]
[MARKET]
[WALL]
Coll. Score
25.2
19.7
12.9
11.3
11.3
8.2
6.9
6
4.7
4.2
4.1
WRITTEN
[CROWD]
[ENGINE]
[CAR]
[LION]
[WIND]
[MOTORCYCLE]
[FIRE]
[TRUCK]
[HELICOPTER]
[FLAME]
[BEAST]
Coll. Score
217.3
203.2
58.5
53.1
46.1
43.6
42.8
32.4
29.7
29.4
28.2
[AUDIENCE]
27.2
29.8
[PLANE]
21.7
[BEAST]
29.1
[TRAIN]
20.7
[AUDIENCE]
24.6
[KONG]
20.6
[THUNDER]
16.2
[TRAIN]
24
[PLANE]
21.2
[JET]
15.8
[KONG]
20.9
[HURRICANE]
15.4
[JET]
20.4
[TORNADO]
14.7
[KATRINA]
17.6
[TIGER]
14.4
[THUNDER]
16.7
[MOTOR]
14.2
[MOTOR]
15.7
COCA subjects of ROAR by Coll.analysis
NEWSPAPER
Coll.
Score
FICTION
Coll.
Score
MAGAZINE
ACADEMIC
Coll.
Score
[CROWD]
67
[ENGINE]
180.1
[ENGINE]
16
[CROWD]
70.1
[CAR]
52.2
[MOTORCYCLE]
32.1
[KONG]
30.5
[AUDIENCE]
12.4
10.8
[HURRICANE]
[MOTORCYCLE]
[JET]
11.1
11
9
[LION]
65.8 [CROWD]
[LION]
27.8
[WIND]
19.1
[CROWD]
[ENGINE]
15
[PLANE]
8.9
[FIRE]
25.1
[TORNADO]
[AUDIENCE]
8.9
[HELICOPTER]
24.5
[FIRE]
8.2
[HELICOPTER]
8.5
[WIND]
22.7
[FLAME]
7.1
7
[TRUCK]
21.6
[ECONOMY]
6.9
[KATRINA]
6.9
[BEAST]
19.9
[TRUCK]
5.2
[LANE]
5.4
[LION]
18.8
[TRAIN]
3.9
[MOTOR]
18.5
[PLANE]
3.7
17.7
17
[STORM]
3.4
[MOUSE]
[FIRE]
5
[STORM]
4.2
[TRAIN]
3.4
[FLAME]
[TIGER]
[FAN]
3.2
3.2
[TRAIN]
14.5
[SEA]
2
[SOUND]
2.9
[TYRANNOSAUR]
12.1
[MARKET]
1.8
[SMITH]
2.6
[BUS]
11.6
[CAR]
1.8
[ECONOMY]
2.6
[THUNDER]
11.3
[RIVER]
1.3
[STOCK]
2.6
[AUDIENCE]
11.1
[STOCK]
1.2
[MARKET]
2.4
[BATMOBILE]
10.3
[AIR]
0.8
[FAN]
Coll.
Score
14.8
9.4
Reliance measure
• “Reliance” (Hans-Joerg Schmid)
• = “Relevance” score in COCA interface
• = “Faith(fulness)” in Coll.analysis output
• = (freq in construction/freq.of word in corpus)x100%
• We set min. freq = 5
COCA subjects of ROAR by Reliance
ALL
[TYRANNOSAUR]
[BATMOBILE]
[IVOR]
[JETLINER]
[STAG]
[MOTORCYCLE]
[ENGINE]
[CHOPPER]
[LION]
[CROWD]
[BONFIRE]
[HARLEY]
[WILDFIRE]
[BEAST]
[FLAME]
Rel.
4.08
2.18
1.42
1.07
0.77
0.77
0.58
0.51
0.45
0.45
0.43
0.42
0.35
0.35
0.32
SPOKEN
Rel.
[LION]
[MOUSE]
[FLAME]
[KATRINA]
[HURRICANE]
[CROWD]
[STOCK]
[ECONOMY]
[WALL]
[FIRE]
[MARKET]
0.52
0.51
0.49
0.46
0.29
0.09
0.05
0.05
0.04
0.04
0.04
WRITTEN
[TYRANNOSAUR]
[BATMOBILE]
[IVOR]
[JETLINER]
[MOTORCYCLE]
[STAG]
[ENGINE]
[CHOPPER]
[CROWD]
[HARLEY]
[LION]
Rel.
4.26
2.24
1.49
1.46
0.82
0.81
0.62
0.62
0.51
0.48
0.44
COCA subjects of ROAR by Reliance
NEWSPAPER
[MOTORCYCLE]
[CROWD]
[MOUSE]
[KATRINA]
[ENGINE]
[HURRICANE]
[HELICOPTER]
[JET]
[PLANE]
[LANE]
Rel.
0.58
0.49
0.37
0.34
0.29
0.28
0.25
0.16
0.12
0.11
FICTION
[TYRANNOSAUR]
[BATMOBILE]
[MOTORCYCLE]
[KONG]
[ENGINE]
[IVOR]
[HARLEY]
[HELICOPTER]
[TIGER]
[HEATER]
ACADEMIC
Rel.
4.88
2.48
1.91
1.9
1.82
1.73
1.04
0.94
0.89
0.88
Rel.
[LION]
0.31
[CROWD]
0.31
MAGAZINE
Rel.
[TORNADO]
[LION]
[CROWD]
[FLAME]
[AUDIENCE]
[ENGINE]
[WIND]
[TRUCK]
[ECONOMY]
[STORM]
0.69
0.67
0.61
0.29
0.19
0.17
0.15
0.11
0.08
0.08
• sentence elicitation task
• adult corpora & measure association strength of subject
nouns associated with verbs
• adult corpora & behavioural profile analysis of factors for
verbs
Factors and Levels
Cases
SUBJECT
OBJECT
TENSE
example1
HUMAN
HUMAN
PAST
example2
NON_HUMAN
INANIMATE
PAST
example 3
HUMAN
HUMAN
PAST
example4
HUMAN
HUMAN
PRES
example5
INANIMATE
HUMAN
PAST
example6
HUMAN
INANIMATE
PAST
example7
HUMAN
INANIMATE
PAST
example8
HUMAN
INANIMATE
FUTURE
example9
HUMAN
INANIMATE
PAST
example10
NON_HUMAN
INANIMATE
PAST
HOWL and ROAR in 200 samples of
COCA WRITTEN
animal
human
inanimate
unknown
HOWL
34
117
48
1
ROAR
7
86
106
1
200
200
All [animal] subject examples in 200 sampled ROAR
hits
"AAARRRGGGHHH! "roared the angry Bunyip.
2010 FIC Faces
Sea-lions roared on the Lobos Rocks off shore
1999 MAG Smithsonian
pigeons disappeared, many people thought they'd
simply roared away somewhere else
1990 MAG Wilderness
Lead Actor # Michael Caine, WWII:
When Lions Roared
1994 NEWS USAToday
that great human cat who neither roars nor growls
1997 ACAD ScandinavStud
almost time for the stags to begin roaring
2011 FIC Bk:ColdVengeance
Ahead, uphill, he hears the tiger roar
2001 FIC VirginiaQRev
Some [inanimate] subject examples in 200 sampled
ROAR hits
The car veered suddenly, the engine roaring.
the radio and air conditioner roaring full-blast.
Because Germany and the other north European economies are
roaring ahead, flushing tax money into government coffers
pulls him off the roadway as the truck roars by.
when the train roars across above, bodies spilled and still, barely
stirring.
Although frequently interrupted by the sounds of airplanes roaring
overhead, Smaltz patiently tried to explain.
Huddled inside our trusty geodome with the wind roaring so loudly we
could barely hear one another speak
[animal] subjects in sampled data
• 7/200[animal] hits for ROAR
• 34/200 [animal] hits for HOWL
• [animal] is rather unlikely to play a very significant role in
any multifactorial analysis of ROAR
• When compared with verbs which have very high proportion of
[animal] subjects, [animal] may be significant for ROAR in a
negative way
• When compared with verbs which have no (or almost no) [animal]
subjects, [animal] probably won’t reach significance as a level with
ROAR
Cluster analysis of “yell” verbs based on
sampled 200 hits for each verb
inanimate *** (+)
human *** (-)
animal ns.
Evaluating results of three approaches
• Elicitation
• LION-ROAR, WOLF/DOG-HOWL association is dominant
• [animal] - ROAR/HOWL association is strong
• Grounded in special roles for animals in learning English?
• Corpus: subject words of ROAR
• LION-ROAR association is present but not dominant, by frequency
• Frequency results vary considerably between sub-corpora
• [animal]-ROAR association is strong, by Reliance measure
• Grounded in the whim (?) of assembling texts for a corpus and the choice of
association measure
• Corpus: factor analysis of ROAR, HOWL etc.
• [animal]-ROAR/HOWL association not significant
• LION would not be identified as associated with ROAR
• Grounded in whim (?) of assembling texts for a corpus, focus on broad
features rather than specific words, choice of other verbs to compare with
Evaluating results of three approaches
• Elicitation
• LION-ROAR, WOLF/DOG-HOWL association is dominant
• [animal] ROAR/HOWL association is strong
• Grounded in special roles for animals in learning English?
• Corpus: subject words of ROAR
• LION-ROAR association is present but not dominant, by frequency
• Frequency results vary considerably between sub-corpora
• [animal]-ROAR association is strong, by Reliance measure
• Grounded in the whim (?) of assembling texts for a corpus and the choice of
association measure
• Corpus: factor analysis of ROAR, HOWL etc.
• [animal]-ROAR/HOWL association not significant
• LION would not be identified as associated with ROAR
• Grounded in whim (?) of assembling texts for a corpus, focus on broad
features rather than specific words, choice of other verbs to compare with
Evaluating results of three approaches
• Elicitation
• LION-ROAR, WOLF/DOG-HOWL association is dominant
• [animal] ROAR/HOWL association is strong
• Grounded in special roles for animals in learning English?
• Corpus: subject words of ROAR
• LION-ROAR association is present but not dominant, by frequency
• Frequency results vary considerably between sub-corpora
• [animal]-ROAR association is strong, by Reliance measure
• Grounded in the whim (?) of assembling texts for a corpus and the choice of
association measure
• Corpus: factor analysis of ROAR, HOWL etc.
• [animal]-ROAR/HOWL association not significant
• LION would not be identified as associated with ROAR
• Grounded in whim (?) of assembling texts for a corpus, focus on broad
features rather than specific words, choice of other verbs to compare with
Conclusion
• Elicitation of sentences and corpus-based approaches are
grounded in quite different realities – one shouldn’t expect
all results to “converge”.
Conclusion
• Elicitation of sentences and corpus-based approaches are
grounded in quite different realities – one shouldn’t expect
all results to “converge”.
• Both convergent and divergent results can lead to a better
understanding of different data and different methods (cf.
Kepser & Reis 2005, Arppe & Järvikivi 2007).
Conclusion
• Elicitation of sentences and corpus-based approaches are
grounded in quite different realities – one shouldn’t expect
all results to “converge”.
• Both convergent and divergent results can lead to a better
understanding of different data and different methods (cf.
Kepser & Reis 2005, Arppe & Järvikivi 2007).
• When working with corpora, specific words and
semantic/syntactic etc. features are of interest.
Conclusion
• Elicitation of sentences and corpus-based approaches are
grounded in quite different realities – one shouldn’t expect
all results to “converge”.
• Both convergent and divergent results can lead to a better
understanding of different data and different methods (cf.
Kepser & Reis 2005, Arppe & Järvikivi 2007)
• When working with corpora, specific words and
semantic/syntactic etc. features are of interest.
• Behavioral Profile analysis should not deter us from also
carrying out a Collostructional Analysis.