TISC - Lexical Priming

Download Report

Transcript TISC - Lexical Priming

Forecasting the beginnings
of newspaper texts
Some corpus & experimental findings
Michael Hoey, Matthew Brook O’Donnell,
Michaela Mahlberg and Mike Scott
BAAL 11-13 September 2008, Swansea University
The Lexical Priming claim
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously
the words it occurs with (its collocations),
the grammatical patterns it occurs in
(its
colligations),
the meanings with which it is associated (its
semantic associations),
word collocates with against and a
a word against has a semantic association
with sending & receiving communication
(e.g. hear a word against)
send/receive a word against has a pragmatic
association with denial
(e.g. wouldn’t hear a word against)
The Lexical Priming claim
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously
 the words it occurs with (its
collocations),
the meanings with which it is associated (its
semantic associations),
The Lexical Priming claim
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously
 the words it occurs with (its
collocations),
 the meanings with which it is associated
(its semantic associations),
word collocates with against and a
a word against has a semantic association
with sending & receiving communication
(e.g. hear a word against)
send/receive a word against has a pragmatic
association with denial
(e.g. wouldn’t hear a word against)
The Lexical Priming claim
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously
 the words it occurs with (its
collocations),
 the meanings with which it is associated
(its semantic associations),
 the pragmatics it is associated with
(its pragmatic associations),
The Lexical Priming claim
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously
 the words it occurs with (its
collocations),
 the meanings with which it is associated
(its semantic associations),
 the pragmatics it is associated with
(its pragmatic associations),
send/receive a word against has a pragmatic
association with denial
(e.g. wouldn’t hear a word against)
denial + send/receive a word against has a
pragmatic association with hypotheticality
(e.g. wasn’t prepared to say a word against)
send/receive a word against has a pragmatic
association with denial
(e.g. wouldn’t hear a word against)
denial + send/receive a word against has a
pragmatic association with hypotheticality
(e.g. wasn’t prepared to say a word against)
The Lexical Priming claim
Whenever we encounter a word (or
syllable or combination of words), we also
note subconsciously
 the grammatical patterns it is associated
with (its colligations),
the genre and/or style and/or social
situation it is used in,
whether it is used in a context we are likely
to want to emulate or not
denial + send/receive a word against
colligates with modal verbs
(e.g. wouldn’t hear a word against)
denial + send/receive a word against also
colligates with human subjects and human
prepositional objects
denial + send/receive a word against
colligates with modal verbs
(e.g. wouldn’t hear a word against)
denial + send/receive a word against also
colligates with human subjects and human
prepositional objects
The Lexical Priming claim
All the features we notice prime us so that
when we come to use the word
ourselves, we are likely (in speech,
particularly) to use it in the same lexical
context, with the same grammar, in the
same semantic context, as part of the
same genre/style, in the same kind of
social and physical context, with a similar
pragmatics and in similar textual ways.
The Lexical Priming claim
Our ability to do this is what it means to
know a word.
 We are ALL learners, since we never stop
being primed.
 The only difference between the native
speaker and the non-native speaker is the
way that they are typically primed.
 Creativity is the result of overriding some
of one’s primings.

A footnote
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously …
the words it occurs with (its collocations),
the grammatical patterns it occurs in
(its
colligations),
the meanings with which it is associated (its
semantic associations),
A footnote
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously …
the words it occurs with (its collocations),
the grammatical patterns it occurs in
(its
colligations),
the meanings with which it is associated (its
semantic associations),
A footnote
Whenever we encounter a word (or
syllable or combination of words), we
note subconsciously …
the words it occurs with (its collocations),
the grammatical patterns it occurs in
(its
colligations),
the meanings with which it is associated (its
semantic associations),
The Lexical Priming claim
Whenever we encounter a word (or syllable
or combination of words), we also note
subconsciously
the positions in a text that it occurs in, e.g.
does it like to begin sentences? Does it
like to start paragraphs? (its textual
colligations),
the genre and/or style and/or social situation
it is used in
The Lexical Priming claim
Whenever we encounter a word (or syllable
or combination of words), we also note
subconsciously
 the positions in a text that it occurs in, e.g.
does it like to begin sentences? Does it
like to start paragraphs? (its textual
colligations),
the genre and/or style and/or social situation
it is used in
The Lexical Priming claim
Whenever we encounter a word (or syllable
or combination of words), we also note
subconsciously
 the positions in a text that it occurs in, e.g.
does it like to begin sentences? Does it
like to start paragraphs? (its textual
colligations),
 the genre and/or style and/or social
situation it is used in
Research Question
• Do certain words and groups of words
exhibit preferences for particular textual
positions, such as the beginnings of texts
and paragraphs? (Once upon a time is
canonical example)
If they do, how can these items be
discovered in a corpus?
Research Question
• Do certain words and groups of words
exhibit preferences for particular textual
positions, such as the beginnings of texts
and paragraphs? (Once upon a time is
canonical example)
• If they do, how can these items be
discovered in a corpus?
AHRC Textual Priming Project

Using a corpus of Home News articles
from the Guardian/Observer newspaper
1998-2004
◦ Approx. 54 million words
◦ 113,288 articles
Each sentence in body of each article is
classified according to its position
TISC – first sentence of first paragraph
PISC – first sentence of any subsequent paragraph
NISC – any non-initial sentence
Thanks to AHRC
AHRC Textual Priming Project

Using a corpus of Home News articles
from the Guardian/Observer newspaper
1998-2004
◦ Approx. 54 million words
◦ 113,288 articles

Each sentence in body of each article is
classified according to its position
TISC – first sentence of first paragraph
PISC – first sentence of any subsequent paragraph
NISC – any non-initial sentence
Thanks to AHRC
AHRC Textual Priming Project

Using a corpus of Home News articles
from the Guardian/Observer newspaper
1998-2004
◦ Approx. 54 million words
◦ 113,288 articles

Each sentence in body of each article is
classified according to its position
◦ TISC – first sentence of first paragraph
(Text-Initial Sentence Corpus)
Thanks to AHRC
AHRC Textual Priming Project

Using a corpus of Home News articles
from the Guardian/Observer newspaper
1998-2004
◦ Approx. 54 million words
◦ 113,288 articles

Each sentence in body of each article is
classified according to its position
◦ TISC – first sentence of first paragraph
◦ PISC – first sentence of any subsequent paragraph
(Paragraph-Initial Sentence Corpus)
Thanks to AHRC
AHRC Textual Priming Project

Using a corpus of Home News articles
from the Guardian/Observer newspaper
1998-2004
◦ Approx. 54 million words
◦ 113,288 articles

Each sentence in body of each article is
classified according to its position
◦ TISC – first sentence of first paragraph
◦ PISC – first sentence of any subsequent paragraph
◦ NISC – any non-initial sentence
Thanks to AHRC
AHRC Textual Priming Project

Using a corpus of Home News articles
from the Guardian/Observer newspaper
1998-2004
◦ Approx. 54 million words
◦ 113,288 articles

Each sentence in body of each article is
classified according to its position
◦ TISC – first sentence of first paragraph
◦ PISC – first sentence of any subsequent paragraph
◦ NISC – Non-Initial Sentence Corpus
Thanks to AHRC
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
TISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
sentence
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
PISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
sentence
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
PISC
sentence
NISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
sentence
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
PISC
sentence
NISC
sentence
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
PISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
sentence
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
PISC
sentence
NISC
sentence
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
PISC
sentence
NISC
sentence
Summary of positional subcorpora
Guardian Home News 1998-2004
TISC
tokens
types
type/token ratio (TTR)
sentences
mean (in words)
std.dev.
3,122,037
PISC
NISC
12,521,902 19,338,590
58,432
127,038
141,793
53.43
98.57
136.39
113,288
607,125
1,064,493
28
21
18
11.11
9.68
9.88
Method: Intra-textual Key Word Analysis
• Compare the frequency of words and
clusters in one section of text with their
frequency in another
• For example, fresh occurs significantly more
frequently in text-initial sentences (TISC)
than in non-initial sentences (NISC)
• fresh is a text-initial key word
• It also exhibits distinctive patterns in TISC
contexts in terms of collocates:
• fresh{row,controversy, embarrassment}
Method: Intra-textual Key Word Analysis
• Compare the frequency of words and
clusters in one section of text with their
frequency in another
• For example, fresh occurs significantly more
frequently in text-initial sentences (TISC)
than in non-initial sentences (NISC)
• fresh is a text-initial key word
• It also exhibits distinctive patterns in TISC
contexts in terms of collocates:
• fresh{row,controversy, embarrassment}
Method: Intra-textual Key Word Analysis
• Compare the frequency of words and
clusters in one section of text with their
frequency in another
• For example, fresh occurs significantly more
frequently in text-initial sentences (TISC)
than in non-initial sentences (NISC)
• fresh is a text-initial key word
• It also exhibits distinctive patterns in TISC
contexts in terms of collocates:
• fresh{row,controversy, embarrassment}
Method: Intra-textual Key Word Analysis
• Compare the frequency of words and
clusters in one section of text with their
frequency in another
• For example, fresh occurs significantly more
frequently in text-initial sentences (TISC)
than in non-initial sentences (NISC)
• fresh is a text-initial key word
• It also exhibits distinctive patterns in TISC
contexts in terms of collocates:
• fresh {row, controversy, embarrassment}
Method: Comparative KW lists

Take the pair-wise comparisons for TISC,
PISC and NISC and create Key Word and
Key Cluster lists:
TISC_NISC
TISC_PISC
Method: Comparative KW lists

Take the pair-wise comparisons for TISC,
PISC and NISC and create Key Word and
Key Cluster lists:
TISC_NISC
PISC_NISC
TISC_PISC
PISC_TISC
Method: Comparative KW lists

Take the pair-wise comparisons for TISC,
PISC and NISC and create Key Word and
Key Cluster lists:
TISC_NISC
PISC_NISC
NISC_TISC
TISC_PISC
PISC_TISC
NISC_PISC
Method: Key Word/Cluster Matrix

Each word/cluster scored according to
whether (Y) or not (N) it is found on
each of the six lists:
TISC_NISC
TISC_PISC
PISC_NISC
PISC_TISC
NISC_TISC
NISC_PISC
yesterday
Y
Y
Y
N
N
N
said
N
N
Y
Y
Y
N
also
N
N
N
Y
Y
N
recall
N
N
N
N
Y
N
it was
announced
Y
Y
N
N
N
N
revealed
that
Y
N
Y
N
N
N
Method: Key Word/Cluster Matrix

Each word/cluster scored according to
whether (Y) or not (N) it is found on
each of the six lists:
TISC_NISC
TISC_PISC
PISC_NISC
PISC_TISC
NISC_TISC
NISC_PISC
yesterday
Y
Y
Y
N
N
N
said
N
N
Y
Y
Y
N
also
N
N
N
Y
Y
N
recall
N
N
N
N
Y
N
it was
announced
Y
Y
N
N
N
N
revealed
that
Y
N
Y
N
N
N
Method: Key Word/Cluster Matrix

Each word/cluster scored according to
whether (Y) or not (N) it is found on
each of the six lists:
TISC_NISC
TISC_PISC
PISC_NISC
PISC_TISC
NISC_TISC
NISC_PISC
yesterday
Y
Y
Y
N
N
N
said
N
N
Y
Y
Y
N
also
N
N
N
Y
Y
N
recall
N
N
N
N
Y
N
it was
announced
Y
Y
N
N
N
N
revealed
that
Y
N
Y
N
N
N
Method: Key Word/Cluster Matrix

Each word/cluster scored according to
whether (Y) or not (N) it is found on
each of the six lists:
TISC_NISC
TISC_PISC
PISC_NISC
PISC_TISC
NISC_TISC
NISC_PISC
yesterday
Y
Y
Y
N
N
N
said
N
N
Y
Y
Y
N
also
N
N
N
Y
Y
N
recall
N
N
N
N
Y
N
it was
announced
Y
Y
N
N
N
N
revealed
that
Y
N
Y
N
N
N
Method: Key Word/Cluster Matrix

Each word/cluster scored according to
whether (Y) or not (N) it is found on
each of the six lists:
TISC_NISC
TISC_PISC
PISC_NISC
PISC_TISC
NISC_TISC
NISC_PISC
yesterday
Y
Y
Y
N
N
N
said
N
N
Y
Y
Y
N
also
N
N
N
Y
Y
N
recall
N
N
N
N
Y
N
it was
announced
Y
Y
N
N
N
N
revealed
that
Y
N
Y
N
N
N
Method: Key Word/Cluster Matrix

Each word/cluster scored according to
whether (Y) or not (N) it is found on
each of the six lists:
TISC_NISC
TISC_PISC
PISC_NISC
PISC_TISC
NISC_TISC
NISC_PISC
yesterday
Y
Y
Y
N
N
N
said
N
N
Y
Y
Y
N
also
N
N
N
Y
Y
N
recall
N
N
N
N
Y
N
it was
announced
Y
Y
N
N
N
N
revealed
that
Y
N
Y
N
N
N
Categories from patterns
From our corpus there 18 resulting
patterns, covering:

◦
◦
4467 words
50861 clusters
Here we focus on four patterns:
Text-Initial (YYNNNN & YNNNNN)
Paragraph-Initial (NNYYNN & NNYNNN)
TI and PI (YNYNNN)
Non-initial (NNNNYY)
Categories from patterns
From our corpus there 18 resulting
patterns, covering:

◦
◦

4467 words
50861 clusters
Here we focus on four patterns:
1. Text-Initial (YYNNNN & YNNNNN)
Paragraph-Initial (NNYYNN & NNYNNN)
TI and PI (YNYNNN)
Non-initial (NNNNYY)
Categories from patterns
From our corpus there 18 resulting
patterns, covering:

◦
◦

4467 words
50861 clusters
Here we focus on four patterns:
1. Text-Initial (YYNNNN & YNNNNN)
2. Paragraph-Initial (NNYYNN & NNYNNN)
TI and PI (YNYNNN)
Non-initial (NNNNYY)
Categories from patterns
From our corpus there 18 resulting
patterns, covering:

◦
◦

4467 words
50861 clusters
Here we focus on four patterns:
1. Text-Initial (YYNNNN & YNNNNN)
2. Paragraph-Initial (NNYYNN & NNYNNN)
3. TI and PI (YNYNNN)
Non-initial (NNNNYY)
Categories from patterns
From our corpus there 18 resulting
patterns, covering:

◦
◦

4467 words
50861 clusters
Here we focus on four patterns:
1.
2.
3.
4.
Text-Initial (YYNNNN & YNNNNN)
Paragraph-Initial (NNYYNN & NNYNNN)
TI and PI (YNYNNN)
Non-initial (NNNNYY)
Category 1: Text-initial
(YYNNNN, YNNNNN & YYNNNY)
TISC
PISC
NISC
ONE OF BRITAIN’S
132.0
13.4
9.0
A REPORT BY THE
16.0
6.8
3.1
ARE TO BE
271.9
23.8
24.7
THAT COULD
106.7
41.4
61.7
AFTER BEING
334.4
67.8
54.5
normalized to occurrences per million words
• 1,600 (36%) of our key words and 29,303 (58%) of our
key clusters being to this category
Category 1: Text-initial
(YYNNNN, YNNNNN & YYNNNY)
TISC
PISC
NISC
ONE OF BRITAIN’S
132.0
13.4
9.0
A REPORT BY THE
16.0
6.8
3.1
ARE TO BE
271.9
23.8
24.7
THAT COULD
106.7
41.4
61.7
AFTER BEING
334.4
67.8
54.5
normalized to occurrences per million words
• 1,600 (36%) of our key words and 29,303 (58%) of our
key clusters being to this category
Category 2: Paragraph-initial
(NNYYNN,NNYNNN & NNNYNN)
TISC
PISC
NISC
THE FINDINGS
13.1
43.5
16.7
CAME AS
5.8
47.9
9.8
IS THE LATEST
10.6
17.2
3.5
GENERAL SECRETARY
OF THE
15.4
58.5
11.6
CONFIRMED THAT
43.2
66.5
32.3
normalized to occurrences per million words
• 732 (16%) of our key words and 5,755 (11%) of our key
clusters being to this category
Category 3: Text- & Paragraph-initial
(YNYNNN & YNYYNN)
TISC
PISC
NISC
THE CONTROVERSY
28.5
17.2
6.6
HEAD OF THE
93.5
89.4
46.8
DECISION TO
151.8
130.8
73.9
SAID YESTERDAY THAT
80.1
67.5
26.6
ISSUED A
51.9
35.9
20.3
normalized to occurrences per million words
• 253 (6%) of our key words and 913 (2%) of our key
clusters being to this category
Category 3: Text- & Paragraph-initial
(YNYNNN & YNYYNN)
TISC
PISC
NISC
THE CONTROVERSY
28.5
17.2
6.6
HEAD OF THE
93.5
89.4
46.8
DECISION TO
151.8
130.8
73.9
SAID YESTERDAY THAT
80.1
67.5
26.6
ISSUED A
51.9
35.9
20.3
normalized to occurrences per million words
• 253 (6%) of our key words and 913 (2%) of
our key clusters being to this category
Category 4: Non-initial
(NNNNYY, NNNYYY & NYNNYY)
TISC
PISC
NISC
HAVE TO
174.2
352.2
616.1
WHILE
530.1
589.3
701.0
BUT
1262.0
4164.3
6068.6
BE ABLE TO
81.0
108.9
165.5
GOING TO
78.2
268.8
494.9
normalized to occurrences per million words
• 486 (11%) of our key words and 3,105 (6%) of our key
clusters being to this category
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
sentence
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
PISC
sentence
NISC
sentence
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
PISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
…
On Wednesday and Thursday a brief respite should
see most of the country becoming fine, with heavy
rain only expected across parts of Northern
Ireland. But by Friday, much of England and Wales
will again be hit by storms and further downpours.
…
So far, Britain's recent storms have already claimed
the lives of six people.Yesterday, insurers said the
cost of the cleanup could run into tens of millions
of pounds.
TISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
PISC
NISC
79 per million sentences
5 per million sentences
22 per million sentences
TISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
PISC
NISC
117 per 100,000 sentences
25 per 100,000 sentences
12 per 100,000 sentences
TISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
PISC
NISC
48 per thousand sentences
7 per thousand sentences
6 per thousand sentences
TISC
sentence
Method: Sentence classification
More wet weather was predicted across Britain
today as experts warned many areas were already
saturated with rain.
TISC
PISC
NISC
17 per thousand sentences
4 per thousand sentences
3 per thousand sentences
TISC
sentence
Theoretical Implications
Confirmation of prediction made by lexical
priming theory
Knowing a word includes knowing where it
will be used in a text
Clusters are more important than single
words in textual positioning (cf. Wray)
Theoretical Implications
Confirmation of prediction made by lexical
priming theory
Knowing a word includes knowing where it
will be used in a text
Clusters are more important than single
words in textual positioning (cf. Wray)
Theoretical Implications
Confirmation of prediction made by lexical
priming theory
Knowing a word includes knowing where it
will be used in a text
Clusters are more important than single
words in textual positioning (cf. Wray)
Theoretical Implications
Confirmation of prediction made by lexical
priming theory
Knowing a word includes knowing where it
will be used in a text
Clusters are more important than single
words in textual positioning (cf. Wray)
Applied Linguistic Implications
Translation
Academic writing
Authentic data
Death (or redefinition) of the topic
sentence
Applied Linguistic Implications
Translation
Academic writing
Authentic data
Death (or redefinition) of the topic
sentence
Applied Linguistic Implications
Translation
Academic writing
Authentic data
Death (or redefinition) of the topic
sentence
Applied Linguistic Implications
Translation
Academic writing
Death of the topic sentence
Applied Linguistic Implications
Translation
Academic writing
Death (or redefinition) of the topic
sentence
Applied Linguistic Implications
Learning a word or phrase includes
learning its characteristic textual
positioning, or else a learner’s text will
read awkwardly
 Fabricated texts are unlikely to preserve
the natural textual colligations of the
language if the intention of these texts is
to illustrate other features
 Textual colligation is where discourse
analysis and dictionaries meet.
