Dimensions of variation in Hollywood: the language of

Download Report

Transcript Dimensions of variation in Hollywood: the language of

Dimensions of variation in Hollywood: the language of
comedy and drama
Marcia Veirano Pinto (apoio CNPQ)
Why study the language of Hollywood movies?
•
Hollywood movies are present
in our every-day lives and are
widely used in EFL
classrooms, but research on
their language is scarce;
•
Movie talk normally seeks to
emulate spontaneous
conversation in a variety of
social activities. Thus,
studying it may help us further
understand how social
activities and conversational
functions motivate the choice
of lexical and grammatical
features.
Objective
•
To find the dimensions of
variation in Hollywood
comedy and drama across
time.
Hypotheses
•
Movies are classified as comedy, drama,
etc. which may predict their linguistic
realization;
•
The language of movies seems to vary
over time, and so there may be variation
in the way movies are scripted over time;
•
Therefore two variables that might
account for this variation are possibly :
 movie genre (genre)
 time (decade)
The corpus
•
Comprises the transcription of 32 movies
between 1930 and 2009;
•
Features 16 comedies and 16 dramas - 2
comedies and 2 dramas per decade;
•
Has a total of 359,498 tokens and 8,506
types;
•
Each movie may have as few as 5,314
tokens or as many as 16,545 tokens;
•
On average, each movie has about 11,234
tokens.
Selection Criteria
•
The films were chosen:
1.
with the help of the film guide “1001 filmes para ver
antes de morrer” (2008) in which sixty-seven renowned
film critics elected the best films of each decade;
2.
as much as possible by their year of release so as to
correspond to three different points in each decade:
beginning, middle and end. This care was taken to
avoid their bunching up in a single point in each
decade.
3.
based on the availability of:
•
movie on DVD + subtitles to be ripped (in Brazil only;
importing DVDs would be costly and, more importantly,
time consuming);
•
movie + transcription on the web;
Transcriptions
•
The time taken to transcribe every movie was
reduced by the process of automatically
extracting their subtitles from the DVDs with
the help of the software DVDFab 6.0;
•
This step was necessary to guarantee that the
texts in the corpus were true to the actual
dialogues exchanged by the actors;
•
Ripped subtitles were preferred to
transcriptions available on the web because
they tend to be truer to what is actually heard,
thus reducing the time taken to transcribe the
movies.
Movies in the corpus
Name of the film
Genre
US release in:
Number of tokens
Duck soup
comedy
1933
7,514
Mr. Deed goes to town
comedy
1936
16,079
Mr. Smith goes to Washington
drama
1939
11,861
Only angels have wings
drama
1939
13,63
The Philadelphia story
comedy
1940
13,512
The lady Eve
comedy
1941
11,191
Citizen Kane
drama
1941
11,5
It’s a wonderful life
drama
1946
16,545
How to marry a millionaire
comedy
1953
12,75
Some like it hot
comedy
1959
10,573
Sunset boulevard
drama
1950
11,998
Twelve angry men
drama
1957
14,083
The apartment
comedy
1960
13,687
The producers
comedy
1968
7,938
The hustler
drama
1961
9,011
Cool hand Luke
drama
1967
8,072
Movies in the corpus
Name of the film
Genre
US release in:
Number of tokens
M*A*S*H*
comedy
1972
12,853
Manhattan
comedy
1979
14,233
One flew over the cuckoo’s nest
drama
1975
10,332
Kramer vs. Kramer
drama
1979
11,367
Ghostbusters
comedy
1984
8,895
Good morning Vietnam
comedy
1987
15,717
Children of a lesser God
drama
1986
5,314
Rain man
drama
1988
15,846
Groundhog day
comedy
1993
9,29
There’s something about Mary
comedy
1998
11,789
Philadelphia
drama
1993
10,372
American Beauty
drama
1999
8,208
Meet the parents
comedy
2000
12,597
Little Miss Sunshine
comedy
2006
7,603
Lost in translation
drama
2003
5,867
Crash
drama
2004
9,271
Why map movies onto Biber’s 1988 dimension 1?
•
It is likely to throw light onto a very
controversial issue: are texts that were written
to be spoken good examples of spontaneous
oral conversation?
•
Dimension 1 features comprise an array of
linguistic features that are related to the
difference between spoken vs. written texts,
thus perhaps reflecting issues underlying
variation across movies over time and across
genres.
Biber’s Dimension 1:
involved
vs.
informational production
Private verbs
THAT deletion
Contractions
Present tense verbs
Second person pronouns
DO as pro-verbs
Analytic negation
Demonstrative pronouns
General emphatics
First person pronouns
Pronoun IT
BE as main verb
Causative subordination
Discourse particles
Indefinite pronouns
General hedges
Amplifiers
Sentence relatives
WH questions
Possibility modals
Non-phrasal coordination
WH clauses
Final prepositions
Nouns
Word length
Prepositions
Type/token ratio
Attributive adjectives
(Place adverbials)
(Agentless passives)
(Past participial WHIZ deletions)
(Present participle WHIZ deletions)
Methodology: applying the multidimensional model
1.
Collection of the corpus representing Hollywood
drama and comedy;
2.
Deciding on the linguistic features that are
relevant to the research questions through
literature;
3.
Transforming the linguistic features into
quantifiable variables;
4.
Semi-automatic tagging of the corpus with a
Shell script (Berber Sardinha, 2009) containing
the tags for the features on Biber’s (1988)
Dimension 1;
5.
Manual checking of the tagging;
6.
Standardizing of frequencies per 1,000 words to
allow comparison of texts of different lengths;
Spreadsheet excerpt with variable values standardized by a
1,000
Methodology: applying the multidimensional model
7.
Computing of the standardized score by subtracting Biber’s 1988 variable
means on Dimension 1 from the variable means in the Movie Corpus
and dividing them by Biber’s 1988 standard deviation.
Biber’s tag set
tokens
emphatics
second person pronouns
types
final prepositions
Token/type ratio
first person pronouns
sentence relatives (tagged
manually)
word lenght
hedges
amplifiers
indefinite pronouns
analytic negation
non phrasal coordination
attributive adjectives
nouns
be main verb
possibility modals
contractions
prepositions
demonstrative pronouns
present tense
direct wh-questions
private verbs
discourse particles
pronoun it
do pro_verb
public verbs
Methodology: applying the multidimensional model
8. The assigning of scores to the movies in the corpus with the equation :
(positive standardized variable values) - (negative standardized variable values);
9. The plotting of the movie scores along the scale that represents Dimension 1.
10. The checking of the existence of a correlation between the variables genre and
decade with the help of the software SPSS.
Descriptive statistics
Descriptive statistics
Descriptive statistics
| Telephone conversations (37.2)
|
35 | Face-to-face conversations (35.3)
|
|
30 |
|
|
25 |
|
|
20 | Personal letters (19.5)/ spontaneous speeches (18.2)/ interviews (17.1)
|
|
15|
|
|
10|
|
|
5|
| Romantic fiction (4.3)
| Prepared speeches (2.2)
0| Adventure fiction (0.0)
| Mistery fiction (-0.2)/ general fiction (-0.8)
| Professional letters (-3.9)/ Broadcasts (-4.3)
|
-5 |
| Science fiction (-6.1)/ Religion (-7)/ Humor (-7.8)
| Popular lore (-9.3)
-10| Editorials/ Hobbies (-10.1)
| Biografies (-12.4)/ Press reviews (-13.9)
| Academic prose (-14.9)
-15| Press reportage (-15.1)
|
| Official documents (-18.1)
INVOLVED PRODUCTION
Biber’s Dimension 1
INFORMATIONAL PRODUCTION
Zooming in: involved production
| Telephone conversations (37.2)
|
35 | Face-to-face conversations (35.3)
|
|
30 |
|
|
25 |
|
|
20 | Personal letters (19.5)/ spontaneous speeches (18.2)/ interviews (17.1)
|
|
15|
|
|
10|
|
|
5|
| Romantic fiction (4.3)
| Prepared speeches (2.2)
0| Adventure fiction (0.0)
Zooming in: informational production
0| Adventure fiction (0.0)
| Mistery fiction (-0.2)/ general fiction (-0.8)
| Professional letters (-3.9)/ Broadcasts (-4.3)
|
-5 |
| Science fiction (-6.1)/ Religion (-7)/ Humor (-7.8)
| Popular lore (-9.3)
-10| Editorials/ Hobbies (-10.1)
| Biografies (-12.4)/ Press reviews (-13.9)
| Academic prose (-14.9)
-15| Press reportage (-15.1)
|
| Official documents (-18.1)
-20|
Plotting movies on Dimension 1
INVOLVED PRODUCTION
40|
| Telephone conversations (37.2)
|
35 | Face-to-face conversations (35.3)
|
|
30 |
|
|
25 |
|
|
20 | Drama (20.2)/ Movies (20.2)/ Comedy (20.15)/ Personal letters (19.5)/ spontaneous speeches (18.2)/ interviews (17.1)
|
|
15|
|
|
10|
|
|
5|
| Romantic fiction (4.3)
| Prepared speeches (2.2)
|
0| Adventure fiction (0.0)
| Mistery fiction (-0.2)/ general fiction (-0.8)
| Professional letters (-3.9)/ Broadcasts (-4.3)
|
-5 |
| Science fiction (-6.1)/ Religion (-7)/ Humor (-7.8)
| Popular lore (-9.3)
-10| Editorials/ Hobbies (-10.1)
| Biografies (-12.4)/ Press reviews (-13.9)
| Academic prose (-14.9)
-15| Press reportage (-15.1)
|
| Official documents (-18.1)
-20|
INFORMATIONAL PRODUCTION
Zooming in
INVOLVED PRODUCTION
40|
| Telephone conversations (37.2)
|
35 | Face-to-face conversations (35.3)
|
|
30 |
|
|
25 |
|
|
20 | Drama (20.2)/ Movies (20.2)/ Comedy (20.15)/ Personal letters (19.5)/ spontaneous speeches (18.2)/ interviews (17.1)
|
|
15|
|
|
10|
|
|
5|
| Romantic fiction (4.3)
| Prepared speeches (2.2)
|
0| Adventure fiction (0.0)
Dimension 1: 1930
INVOLVED PRODUCTION
Comedy
Mr. Deeds goes to town (24.37)
Duck Soup (11.59)
Drama
Mr. Smith goes to Washington (17.93)
Only angels have wings (17.95)
35|
|
|
30 |
|
|
25|
|
|
20| Movies
|
(20.2)
| Drama
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
(17.9)/ Comedy (17.9)
INFORMATIONAL PRODUCTION
Dimension 1: 1940
INVOLVED PRODUCTION
Comedy
The lady Eve (23.81)
The Philadelphia story (17.52)
Drama
Citizen Kane (16.15)
It’s a wonderful life (19.76)
35|
|
|
30 |
|
|
25|
|
|
20| Comedy
(20.67)
| Movies (19.31)
| Drama (17.96)
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
| INFORMATIONAL PRODUCTION
Dimension 1: 1950
INVOLVED PRODUCTION
Comedy
How to marry a millionaire (16.39)
Some like it hot (24.21)
Drama
Sunset Boulevard (18.72)
Twelve angry men (29.69)
35|
|
|
30 |
|
|
25|
| Drama
(24.21)
| Movies (19.31)
20| Comedy (20.30)
|
|
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
INFORMATIONAL PRODUCTION
Dimension 1: 1960
Comedy
35|
INVOLVED
30 |
|
|
25|
|
|
20|
PRODUCTION
Drama (19.08)
| Comedy (17.36)/Movies (18.22)
|
The apartment (18.9)
The producers (15.8)
Drama
Cool hand Luke (16.2)
The hustler (21.9)
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
|
INFORMATIONAL PRODUCTION
Dimension 1: 1970
INVOLVED PRODUCTION
|
30 |
|
|
25|
Comedy
Manhattan (31.38)
M*A*S*H* (16.0)
Drama
Kramer vs. Kramer (21.37)
One flew over the cuckoo’s nest (22.19)
| Comedy
(23.69)
|Movies (22.74)
20| Drama (21.78)
|
|
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
|
-20
|
INFORMATIONAL PRODUCTION
Dimension 1: 1980
INVOLVED PRODUCTION
Comedy
|
30 |
|
|
25|
|
|
20| Drama
Ghostbusters (20.16)
Good morning Vietnam (16.00)
Drama
Children of a lesser God (21.76)
Rain man (19.55)
(20.66)
|Movies (19.37)
| Comedy (18.08)
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
|
INFORMATIONAL PRODUCTION
Dimension 1: 1990
INVOLVED PRODUCTION
Comedy
Groundhog day (22.12)
There’s something about Mary (21.16)
Drama
American beauty (24.98)
Philadelphia (14.18)
|
30 |
|
|
25|
|
Comedy (21.64)
20| Movies (20.61)
| Drama (19.58)
|
|
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
|
-20|
INFORMATIONAL PRODUCTION
Dimension 1: 2000
INVOLVED PRODUCTION
Comedy
Little Miss Sunshine (23.94)
Meet the parents (19.17)
Drama
Crash (21.03)
Lost in translation (20.47)
|
30 |
|
|
25|
|
| Movies
(21.15) Comedy (21.55)
20| Drama (20.75)
|
|
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5|
|
|
-10|
|
|
-15|
|
|
-20|
INFORMATIONAL PRODUCTION
Dimension 1 and comedy over the years
|
|
30 |
|
|
25 |
INVOLVED PRODUCTION
| Comedy_1970
(23.69)
| Comedy_2000 (21.55)/ Comedy_1990 (21.64)
20 | Movies (20.2)/ Comedy_1940 (20.67)/ Comedy_1950 (20.30)
| Comedy_1980 (18.08)
| Comedy_1960 (17.36)/ Comedy_1930 (17.9)
15|
|
|
10|
|
|
5|
|
|
|
0|
|
|
|
-5 |
|
|
-10|
|
|
-15|
|
|
|
-20|
INFORMATIONAL PRODUCTION
Dimension 1 and drama over the years
35 |
|
|
30 |
|
|
25 |
INVOLVED PRODUCTION
| Drama_1950
(24.21)
| Drama_1970 (21.78)
20 | Movies (20.2)/ Drama_1980 (20.66)/ Drama_2000 (20.75)
| Drama_1960 (19.08)/ Drama_1990 (19.58)
| Drama_1930 (17.9)/ Drama_1940 (17.96)/
15|
|
|
10|
|
|
|
5|
|
|
|
|
0|
|
|
|
-5 |
|
|
-10|
|
|
-15|
|
|
-20|
INFORMATIONAL PRODUCTION
Comedy and Drama over time
| Manhattan (comedy/ 1979)
30|
|Twelve angry men (drama_1957)
|
|
|
25|
| Mr. Deed’s goes to town (comedy/ 1936)/ American beauty (drama/ 1999)/ Some like it hot (comedy/1959)
| The Lady Eve (comedy/1941 )/ Little Miss Sunshine (comedy/ 2006)/
|One flew over the cukoo’s nest (drama/1975)/ Groundhog day (comedy/1993)/
| The hustler (drama/1961 )/ Kramer vs. Kramer (drama/1979)/ Children of a lesser God (drama/ 1986)/ There’s something about
Mary (comedy/1998)/ Crash (drama/ 2004)/
20| Ghostbusters (comedy/1984)/ Lost in translation (drama/2009)
| Rain man (drama/ 1988)/ It’s a wonderful life (drama/1946 )/ Meet the parents(comedy/2000)
| Sunset boulevard (drama/1950)/ The Apartment (comedy/1960)
| Mr. Smith goes to Washington (drama/ 1939)/ Only angels have wings (drama/1939)/ Philadelphia story (comedy/1940)
| Citzien Kane (drama/1941)/ How to marry a millionaire (comedy/1953) Cool hand Luke (drama/1967)/ M*A*S*H
(comedy/1972)/ Good morning Vietnam (comedy/1987)/
15 | The producers(comedy/1968)
| Philadelphia (drama/1993)
|
|
| Duck soup(comedy/1933)/
10|
|
|
|
|
|
|
0 |
Dimension 1: summary of results
Twelve Angry Men represents well
the involved end of factor 1
because it displays a great
number of features that are typical
of speech such as: personal
pronouns, contractions,
interjections, questions, among
others.
Twelve angry men excerpt
Ok, fellows. Can we hold it down a minute?
Sure.
Fellows, say, we'd like to get started. Gentleman at the
window.
We'd like to get started.
Oh, I'm sorry.
Pretty tough to figure, isn't it? Kid kills his father, bang,
just like that?
Oh listen, if you had your eyes open you'd see that
happens all the time.
They let those kids run wild up there. Well, maybe it
serves him right, you know what I mean?
Is everyone here?
The old man is inside.
Oh, would you knock on the door for him?
Yeah.
You a Yankee fan?
No
Dimension 1: summary of results
Duck soup excerpt
On the other hand, Duck soup is
the movie with the lowest level of
involvement because it doesn’t
have as many variables for
involved production as the others.
How do you do?
Miss Marcal.
We've met.
Well, I hope His Excellency gets here soon.
His Excellency makes it a point always to be
on time. As long as I've known him, he's never
been late to an appointment
His Excellency is due to take his
station beginning his new administration
he'll make his appearance when
the clock on the wall strikes ten.
Dendogram
Dendogram
•
This shows the result of cluster analysis, which is a statistical procedure to
group cases into groups that share similar data; in my case, movies that have
similar scores for factor 1. And it also shows you how those large groups break
down into smaller groups and the other way round. It’s a hierarchical
organization.
•
The dendogram shows two basic groups: one comprising movies from 18 to
28, at the top and the other from 5 to 17, qt the bottom, not in this exact order.
Cluster 1
movie
duck
deeds
smith
phil_story
kane
millionaire
producers
luke
mash
vietnam
genre
comedy
comedy
drama
comedy
drama
comedy
comedy
drama
comedy
comedy
philadelphia
drama
decade
cluster
30
30
30
40
40
50
60
60
70
80
1
1
1
1
1
1
1
1
1
1
90
1
Cluster 1
Cluster 1 shows some effect of time with older movies bunching up together here
Cluster 2
angels
lady
life
hot
sunset
twelve
apartment
hustler
manhattan
kramer
cuckoo
ghostbusters
god
rainman
hog
mary
beauty
sunshine
parents
crash
lost
drama
comedy
drama
comedy
drama
drama
comedy
drama
comedy
drama
drama
comedy
drama
drama
comedy
comedy
drama
comedy
comedy
drama
drama
30
40
40
50
50
50
60
60
70
70
70
80
80
80
90
90
90
0
0
0
0
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Cluster 2
Cluster 2, on the other hand, has sort of more recent movies; all of the 2000
movies and most of the 70’s
CONCLUSION
•
The existing classification for movies
doesn’t promote understanding of their
language variation;
•
Hollywood comedy and drama seem to
be a register in their own right, that is,
genre doesn’t seem to lead to any
language variation;
•
There’s variation across individual
movies, but the variables genre and
decades do not explain it, but perhaps
two big time groups, before and after
the 1970’s regardless of genres.
CONCLUSION
•
There seems to be a grammar of movies that is
similar to spontaneous conversation;
•
This means that they can be explored in the EFL
classroom as a relatively good alternative for
spontaneous conversation with some advantages:
Teachers have easy access to them;
They contextualize several linguistic features well,
which is bound to help students learn not only the
structure of grammatical features, but also their
pragmatic and discursive uses.
Their plots and stories are likely to make them
interesting for students;
All these coupled with the fact that their language
tends to be less broken than spontaneous
conversation may override the use of spontaneous
conversation in the classroom.



