Transcript Slide 1

Examining Diachronic Shifts with
Phrasal Verbs: Data from a
New 37 Million Word Corpus of English
Mark Davies
Brigham Young University
http://davies-linguistics.byu.edu
Outline

Phrasal verbs
break down, break it down, break the door down

Intro: Corpus of Historical English
37m words: view.byu.edu/che (m42)

Data: PDE registers (BNC/VIEW)
100m words: view.byu.edu

Data: Corpus of Historical corpus
Overall shifts with three constructions
By lexical verb (break) and participle (down)
Comparison to Modern English data
Possible motivations for shifts (??)
Corpus of Historical English: view.byu.edu/che (password: m42)
Corpus of Historical English: Size and distribution
Century
# words
1000s
207,594
1100s
53,359
1200s
313,717
1300s
1,077,330
1400s
1,388,448
1500s
3,182,053
1600s
5,127,445
1700s
3,688,076
1800s
10,422,785
1900s
11,063,571
TOTAL
36,524,378
Simple frequency of word or phrase over time: turn on
Spelling changes: vn* (chart)
Spelling changes (table): rank-frequency listing by 1500s
Morphology: *ly (+1900s -1800s -1700s)
Morphology/lexicon: word roots: *light* (+1900s -1600s)
Lexicon: New words in 1900s: * (+1900s -1800s)
Lexicon: Last occurrence in 1800s: * (+1800s -1900s)
Semantics (collocates): hard * (+1900s -1500s)
Semantics (collocates): * meat (+1900s -1500s)
Semantics (wide-range collocates): market [5L/5R] (+1900s -1600s)
More information….
Modern English (100m words; BNC; view.byu.edu): [vv*] [avp]
Within genre/register: [vv*] [avp] in SPOKEN
[vv*] [avp] in Spoken-Speech-Unscripted
BNC/VIEW: +FICTION -ACADEMIC
BNC/VIEW: +ACADEMIC -FICTION
BNC/VIEW: [vvi] it [avp] (4-6 main registers)
BNC/VIEW: But probably just function of frequency of pronouns overall (cf. see it)
[davies:avp] [up/down] (uppe, doune, vp, etc)
[davies:it] (it, itte, yt, hyt, etc)
GENERAL: to * [davies:avp] (to give up, to break down, to watch out)
to * [davies:avp]: 1700s
to * [davies:avp]: 1900s
to * [davies:avp] +1800s -1900s
to * [davies:avp] +1900s -1800s
Contrasting verbs, 1800s / 1900s
+1800s -1900s
1800s = 10,423,000 words
+1900s -1800s
1900s = 11,064,000 words
out
191 types, 260 tokens
launch out 5; us out 4; carve out
4; quarry out 4; camp out 4; ooze
out 4; search out 4; flat out 3;
thrust out 3; cast out 3; spring out
3; say out 3; follow out 3; see out
3; sue out 3; ravel out 3
224 types, 387 tokens
figure out 15; try out 15; rule out 14; check out
12; opt out 12; phase out 9; stay out 8; muck out
5; round out 5; hell out 5; screen out 4; pig out
4; tune out 4; iron out 4; bail out 4; key out 3;
flare out 3; push out 3; suss out 3; hammer out
3; live out 3; lose out 3; sniff out 3; prove out 3;
cancel out 3; pound out 3; eat out 3
up
202 types, 268 tokens
rip up 6; fire up 5; haul up 4; bear
up 4; peck up 3; cast up 3; yield
up 3; bubble up 3; row up 3;
heave up 3; heap up 3; breed up
3; brace up 3; reckon up 3; fasten
up 3; block up 3; grub up 3
272 types, 462 tokens
speed up 24; end up 12; live up 12; sign up 11;
check up 8; shack up 6; hurry up 6; join up 6;
tidy up 6; tank up 5; step up 5; shape up 5;
match up 4; act up 4; burn up 4; ease up 4; rest
up 4; tee up 4; meet up 3; picking up 3; own up
3; jump up 3; face up 3; soak up 3; pep up 3;
scare up 3; loosen up 3; firm up 3
down
107 types, 134 tokens
fine down 3; bow down 3; pay
down 3; sink down 3; fasten down
3; score down 3; slip down 3;
plough down 3
107 types, 152 tokens
pin down 8; crack down 7; track down 7; wind
down 5; bed down 4; calm down 4; clamp down
4; touch down 3; talk down 3
TOTAL
500 TYPES
662 TOKENS
603 TYPES
1001 TOKENS
GENERAL: to * [davies:avp] (to give up, to break down, to watch out)
By verb participle: to * [davies:down]
CENTURY
1000s
1100s
1200s
1300s
1400s
1500s
1600s
1700s
1800s
1900s
down
PER MILLION
0.0
0.0
3.2
9.3
18.7
45.9
53.0
52.9
52.5
53.7
FREQUENCY
0
0
1
10
26
146
272
195
547
594
1000s
1100s
1200s
1300s
1400s
1500s
1600s
1700s
1800s
1900s
CENTURY
up
PER MILLION
19.3
0.0
35.1
38.1
48.3
140.5
167.7
167.8
152.4
175.4
FREQUENCY
4
0
11
41
67
447
860
619
1588
1940
1000s
1100s
1200s
1300s
1400s
1500s
1600s
1700s
1800s
1900s
CENTURY
out
PER MILLION
4.8
0.0
38.3
64.0
74.2
158.7
156.6
154.3
122.0
149.4
FREQUENCY
1
0
12
69
103
505
803
569
1272
1653
1000s
1100s
1200s
1300s
1400s
1500s
1600s
1700s
1800s
1900s
CENTURY
in
PER MILLION
14.5
93.7
121.1
422.3
473.9
461.7
422.0
449.3
392.8
407.9
FREQUENCY
3
5
38
455
658
1469
2164
1657
4094
4513
to be in
to him in
to seyn in
to lyue in
to God in
Separable: (V) [PRON] [AVP] (turn it down)
Separable: * [PRON] [AVP] in 1600s
1600s: separable (n >= 3)
make it up 10; fight it out 7; set it downe 7; put them out 7; of them out 6;
take it up 6; cast him out 6; draw it out 6; put it out 6; put him out 5; find it
out 5; point me out 5; stand it out 5; set it out 4; make it out 4; set thee vp
4; take it out 4; took him up 4; do it out 4; ride it out 4; set me up 4; get it
out 4; lift it up 3; set him out 3; fetch it out 3; put it up 3; found him out 3;
doe it out 3; draw it up 3; let it out 3; bring him vp 3; lay it up 3; sooth him
up 3; try it out 3; turn it up 3; shut me out 3; brought her up 3; blow them
down 3; set them down 3; pass us up 3; stir them up 3; found her out 3; let
her downe 3; turn'd me out 3; let them out 3; wind it up 3; set them out 3;
put me downe 3; fills it vp 3; set it down 3; play it out 3; score it up 3; laid
me down 3; set them downe 3; bring us out 3; take them up 3; took me up
3; set them up 3; stirre them up 3; take them out 3
(V) [PRON] [AVP] (make it up) vs. simple (V) [PRON] (make it  )
Overall shifts with three main constructions
2.8
2.6
to take down
2.4
2.2
2
take NP down
1.8
take the net down
1.6
1.4
take PRON down
1.2
take it down
1
0.8
1200s
1300s
1400s
1500s
1600s
1700s
1800s
1900s
New corpora
• Historical: 37 million words
• Modern: 100 million words (BNC)
• Many different types of queries
– Words, phrases, substrings
– Frequency (sort/limit by century/register)
– Collocates (sort/limit by century/register)
– Charts (overall): century/register