Marlowe or Shakespeare:Determining the Authorship of a

Download Report

Transcript Marlowe or Shakespeare:Determining the Authorship of a

Marlowe or Shakespeare?
Determining the Authorship of a
Mysterious Play
Chapter 9, Exercise 4
Bill Camarinos
Andy Gibbons
1
Background
• Virtually every year for the past one
hundred years a play or other work of
literature is found somewhere in the United
Kingdom ostensibly written by William
Shakespeare or Christopher Marlowe.
• Specialists in Elizabethan literature
typically conclude that these “finds” are
frauds.
2
Shakespeare and Marlowe
• Both were born in 1564.
• Shakespeare died in 1616.
• Marlowe supposedly killed in a tavern
brawl in 1593, but many suspect that this
death was staged. There is enough doubt
that Marlowe’s window in Westminster
Abbey’s Poets’ corner has the dates of his
life as “1564-1593?”
3
Shakespeare Authorship
Controversy
•
•
Some maintain that someone other than Shakespeare was the true author of the
Shakespearean canon.
Among the candidates
–
–
–
–
•
•
Edward DeVere, 17th Earl of Oxford.
Francis Bacon.
Queen Elizabeth I
Christopher Marlowe
Every year there is a court-type competition in Washington among leading
attorneys to prove who is the author. Supreme Court Justices sit as judges.
Separately, there is a prize, the Hoffman prize, that will be given to whoever
can convince the world that Christopher Marlowe wrote the works attributed to
Shakespeare.
4
Our Assumptions
• Shakespeare, not Marlowe or anyone else,
wrote the Shakespearean canon.
• The mystery play which has been found was
definitely written by either Marlowe or
Shakespeare. It is not another of the frauds
that keep turning up.
5
What we have to work with
• An electronic version of a play of unknown
authorship
• Electronic versions of all known works of
William Shakespeare
• Electronic versions of all known works of
Christopher Marlowe
6
How we propose to proceed
• Investigate how quantitative techniques and
computers have been used to solve authorship
attribution problems in the past.
• Determine which techniques have the greatest
probabilities of success.
• Design a process for applying the selected
techniques using what we have at our disposal.
• Determine the true authorship.
7
Early and Simple Quantitative
Approaches
• Compare word length. Frequency
distribution of word lengths in works by the
authors in question.
• Average number of syllables per word.
• Sentence length
• Percentages of different parts of speech
8
What is the result of applying the
simple tests?
• Many are better at identifying types of writing
(e.g. narrative vs. drama) than they are at
distinguishing one author from another.
• The word-length test was actually applied to
Shakespeare and Marlowe and the result was
“Christopher Marlowe agrees with Shakespeare
about as well as Shakespeare agrees with himself.”
9
Other Methods
• Function-word approach. Focus on the
frequency with which different articles,
conjunctions and prepositions (“context-free
words”) are used. Frequencies often vary
significantly from one author to another.
• Measure “pace” - the rate of introduction of
new vocabulary into the texts.
• Focus on words used only once or twice.
10
Other Methods (Continued)
• Cumulative Sum Charts (cusums or qsums)Compare two features using a chart
– one of which is sentence length
– the other of which is something like the number
of two or three letter words in each sentence
– similar chart patterns suggest uniform
authorship.
– chart patterns for a different author will diverge
11
Other Methods (Continued)
• Use of Neural Networks
– Neural Networks have powerful pattern-recognition
capabilities
– Network is “trained” or calibrated using data from a
known author ( such as the known works of
Shakespeare or Marlowe)
– The network can then classify doubtful text (such as the
mystery play) based on what it has “learned.”
– Two researchers reported success using neural networks
to compare Shakespeare and Marlowe.
12
What Previous Authorship
Attribution Studies Have Shown
• The simplest tests (e.g. word length analysis) don’t work.
• Some only slightly more complex tests (e.g. function-word analysis)
have had some success.
• Combinations of tests, even if some are quite simple, have a high
probability of success.
• Success in attribution is much more likely when only two candidate
authors are present.
• Success becomes even more likely if there is a large body of known
material available (and we have all the known works of both
Shakespeare and Marlowe).
• With leading edge techniques that you really don’t understand-Don’t
try this at home.
13
Methods We Considered
• Even though they show a lot of promise we ruled out
neural networks since we have no experience at all in using
them.
• We also considered data mining.
– Data are stored in a data warehouse.
– Query and reporting tools, multidimensional analysis tools, and
intelligent agents are used to analyze the data.
– For example, intelligent agents could be fitted with an algorithm
designed to find patterns.
– We decided that data mining was overkill for the problem at hand.
14
Method We Selected
• Use a readily available relational data base,
Oracle, as our analysis and reporting tool.
• Relational data bases organize data into tables
which are related to one another using key fields.
– Some of the tables we would create
•
•
•
•
Words used by Shakespeare.
Troublesome words used in the plays
Weird words used by Shakespeare
Examples of each author’s use of verse and meter.
15
Method We Selected (Continued)
• Structured Query Language (SQL) or the associated Query
by Example (QBE) would be used to query the data.
• We would define how many points of similarity in use of
language, verse, etc. would be needed to establish
authorship. For example, samples of Shakespeare’s and
Marlowe’s use of iambic pentameter in their known works
would be compared to that in the mystery play
• Oracle’s Report Generator would be used to create a report
showing how the mystery play compares with the known
texts based on the criteria we established.
16
Conclusion
• Our task has been a fascinating, and fun, one.
• Our survey of the work previously done showed that Computers and
Linguistics have come a way and that computers can be used to help
solve the type of authorship attribution questions that scholars have
debated for years.
• We believe that using a powerful relational data base to perform the
kinds of tests that have proven most successful in previous studies
would convince the quantitatively oriented community of the
authorship of the mystery play.
• We would seek validation of our results from an Elizabethan scholar
who specializes in the works of Shakespeare and Marlowe. This
would give credence to our results among those who are dubious of
quantitative approaches.
17