Digital methods for Literary Criticism

Download Report

Transcript Digital methods for Literary Criticism

Digital methods for Literary
Criticism
Fabio Ciotti
University of Rome Tor Vergata
1
Methodological Intersections…
• This lecture aims at presenting a critical overview of the principal
methods adopted in Computational Literary analysis
• We are really talking about Methodological Intersections:
•
•
•
•
•
Literary Criticism and History of Literature
Theory of literature
Computational Linguistics
Statistics and Probability studies
Computer science
• Machine learning
• Data mining
• …
The wider context: Digital literary studies
• Recent coinage but more and more
successful
• the application of computational methods
and the use of digital tools to study literary
texts and related phenomena
• In fact is one of the fundamental assets od
DH since its very origin
•
•
•
•
•
•
•
Digital Scholarly Editing and Digital Philology
Text Encoding and digital annotation
Text analysis and Computational criticism
Quantitative sociology of literature
Hypertext and new media studies
Electronic Literature
…
What we will talk about…
• Methodological issues in computational literary criticism
• Traditional quantitative approaches
• Distant reading approaches
• Critical stance of distant reading approaches
• Annotation and ontological modeling in literary analysis
Methodological issues in computational
literary criticism
• The role of modeling and the methodological foundation of DH
• Distant reading vs close reading
• The exploratory approach and distant reading
Modeling and DH methodology
• Since its very origin, when it was still called Humanities Computing,
DH domain has been characterized by the strong relevance of
methodological issues
• “At its core, then, digital humanities is more akin to a common
methodological outlook than an investment in any one specific set of texts or
even technologies.”
• [What Is Digital Humanities and What’s It Doing in English Departments? Matthew G.
Kirschenbaum]
• The central terms in this theoretical and methodological debate have
been model and modeling…
• … again quit difficult to define!
Modeling
• The most thorough treatment of the concept is due to Willard McCarty
• By "modeling" I mean the heuristic process of constructing and manipulating models',
a "model" I take to be either a representation of something for purposes of study,
or a design for realizing something new. These two senses follow Clifford Geertz's
analytic distinction between a denotative "model of" such as a grammar describing
the features of a language, and an exemplary "model for" such as an architectural
plan
• This distinction is not completely defined, and it “also reaches its vanishing
point in the convergent purposes of modelling; the model of exists to tell us
what we do not know, the model for to give us what we do not yet have.
Models realize”
• W. McCarty, Modeling: A Study in Words and Meanings
Modeling
• We use the term "model" in the following sense: To an observer B, an
object A* is a model of an object A to the extent that B can use A* to
answer questions that interest him about A. The model relation is
inherently ternary. Any attempt to suppress the role of the intentions of
the investigator B leads to circular definitions or to ambiguities about
"essential features" and the like. It is understood that B's use of a model
entails the use of encodings for input and output, both for A and for A*. If
A is the world, questions for A are experiments. A* is a good mode of A, in
B's view, to the extent that A*'s answers agree with those of A's, on the
whole, with respect to the questions important to B
• Marvin L. Minsky, Matter, Mind and Models
• The model must be determined, isomorphic to the domain and at the same
time dependent on the perspective of the community who has the
responsibility on it
Formal Modeling
• I prefer to adopt a notion of modeling strongly connected with that of
formalization
• Formalization is to be understood as a set of semiotic and
representational methods that generates a representation of a (or a
set of) phenomenon/object accessible and algorithmically
computable (at least partially)
• Formal models
•
•
•
•
Mathematical (physics theories)
Logical (axiomatizations)
Statistical
Computational (data structures, programs, simulations…)
Modeling functional taxonomy
• Representational/descriptive modeling
• This type of modeling is aimed at summarizing or representing the domain and its
data structure in a formal compact manner.
• Unlike explanatory modeling, in descriptive modeling the reliance on an underlying
causal theory is absent or incorporated in a less formal way, although we can say that
modeling always encompass a theory of the domain to be modelled
• Example: text encoding
• Explanatory modeling
• Explaining is providing a causal explanation and explanatory modeling is the use of
formal models for testing causal explanations
• Predictive modeling
• the process of applying a formal model or computational algorithm to data for the
purpose of predicting new or future observations
• Cfr. Galit Shmueli, To Explain or to Predict?
Distant reading
• Understanding literary phenomena
analyzing (computationally)
massive amounts of textual data
• The groundbreaking steps in this
direction are due to the Stanford
Literary Lab, founded and directed
by Franco Moretti and Matthew
Jocker
• Moretti himself has attempted to
give a literary theoretical rationale
to these experimentations,
introducing the notion of “distant
reading”
Distant reading
• The basic idea is that there are
synchronic or diachronic literary and
cultural facts and phenomena that are
undetectable to the usual deep
reading and local interpretation
methods and that requires the
scrutiny of hundreds or thousands of
texts and documents (and millions of
lexical tokens)
• In this way we can gain access to
otherwise unknowable information
that plays a significant explanatory
role in understanding literary and
cultural facts and history/evolution
• … the trouble with close reading (in all of its
incarnations, from the new criticism to
deconstruction) is that it necessarily depends
on an extremely small canon. This may have
become an unconscious and invisible premise
by now, but it is an iron one nonetheless: you
invest so much in individual texts only if you
think that very few of them really matter. [...]
At bottom, it’s a theological exercise— very
solemn treatment of very few texts taken very
seriously— whereas what we really need is a
little pact with the devil: we know how to
read texts, now let’s learn how not to read
them. Distant reading: where distance, let me
repeat it, is a condition of knowledge: it
allows you to focus on units that are much
smaller or much larger than the text: devices,
themes, tropes— or genres and systems. And
if, between the very small and the very large,
the text itself disappears, well, it is one of
those cases when one can justifiably say, Less
is more.
Close versus Distant Reading
• Close reading is the act of analyzing one (or a small set of) work(s) based
upon deep reading and interpretation of local features and aspect of its
formal structure or content
• For example, one could analyze a Goethe Faust poem based on the usage
of metaphors, the meanings of each word, and comparisons drawn with
preceding (or following) works based upon Faust myth
• Another example would be to compare and contrast two characters from
two different works based upon the content of their interactions with
others characters or situations (f.ex. Ulysses/Bloom)
• The notion of close reading is attributed to the theories of Richards and of
the American New critics, but we can say that in general is the main
methodological approach of most of 20th century literary scholarship, from
Russian formalist to Structuralism, Semiotics and even Poststructuralism
(with some exception in the sociology of literature and in the British
Cultural Studies)
Close versus Distant Reading
• Distant reading focuses on analyzing big or huge sets of works, usually
adopting quantitative methods to examine a determined set of
quantifiable textual features, to investigate and explain literary and
cultural macro-phenomena like
• the evolution of genres
• the affirmation of a style and its reception
• the presence of recurrent content/theme in a given time span of literary
history
• the notion of influence and intertextuality
• the sociological facts and aspects of literature
• The very idea of Moretti is that this quantitative formalism approach
is the only way to study literature without restricting the attention to
the Canon of the “great works”
Distant reading: methods an tools
• Data mining/machine learning heuristics and social network analysis
are the preferred methods for distant reading in that they give the
possibility to search for implicit recurring patterns and regular
schemes inside wide amounts of not or poorly structured data, usually
not visible to the naked eye
• topic modeling: the research of lexical tokens pattern co-occurring with a
noticeable frequency inside a text or a corpus
• text clustering: the use of statistical clustering algorithms applied to specific
textual features to automatically classify them in significant categories
• sentiment analysis: giving a quantitative value to the emotional valence of
sentences and text by means of an emotional metric attributed to a set of
lexical items
• network analysis, a set of methods and strategies for studying the relationships
inside groups of n-individuals, based on graph theory
The exploratory approach and distant reading
• Doing research without prior formal modeling and theorizing is a tacit
assumption of many recent works based on Data Analytic and Distant
Reading
• The general underlying idea is the application of data mining and
machine learning heuristics to search for implicit recurring patterns
and regular schemes inside wide amounts of unstructured (or poorly
structured) data, usually not visible to the naked eye
• In most of these applications, the phase of technical analysis
precedes theoretical modeling in the research process
• Problem => Data => Analysis => (Model) => Explanation
Big Data and distant reading
• The problem with this exploratory way of doing research with
humanities objects is that
• at the level of data building a lot of interpretation and theory is involved. So
we have a double modeling phase, the former occurring before (and
governing how) we build the data set, and the latter occurring before (and
governing) the analysis:
• Problem => Data Model => Data => Model => Analysis => Explanation
• humanities object are intentional objects and it is very difficult to find
anything relevant without a previous hypothetical model of what we are
looking for
• starting from (presumed) raw data we can draw many different conclusions
without having any acceptable criteria for deciding which is the best one, the
one that best explain our phenomenical data
Before we start: the data
• To use Big Data methods you need big textual data collections
• Existing scholarly collections (TCP, DTA, WWP, OTA, BibIt…)
• High quality (avg), medium to low size data set, usually in XML format
• Existing non scholarly collections (Gutenberg, Liber Liber..)
• Medium to low quality, medium to big data sets, usually in text only format (Unicode
hopefully)
• Huge collections (Hathi Trust, Internet archive, Google Books)
• Low quality (uncleaned OCR), huge data sets, text format
• Do it yourself: scanning and OCR
• Commercial or open source OCR works well with modern print books
• You have to do a lot of cleaning and correction (some can be automated using reg-ex:
Cleaning OCR’d text with Regular Expressions)
Cleaning the data…
• In most case the data set must undergo a cleaning preprocessing
phase to
• reduce the dimensionality (complexity) of the data set
• eliminate data that could invalidate the analysis: running headers, page
numbers, grammatical morphemes (???!)
• Filtering: remove words that bear little or no content information, like
articles, conjunctions, prepositions, etc.
• Lemmatization: methods that try to map verb forms to the infinite
tense and nouns to the singular form
• Stemming: try to build basic forms of words-set
Traditional quantitative text analysis
• The atomic datum is usually the word or orthographic form (intended as
the maximal sequence of coded characters that is meaningful in a textual
sequence)
• There is some evidence that letter sequences and information about parts of speech
sometimes work better than words for authorship attribution, but words have the
advantage of being meaningful in themselves and in their significance to larger issues
like theme, characterization, plot, gender, race, and ideology (Hoover)
• Full text search (or contextual full text search)
• Basic statistical analysis:
• Frequency list
• Collocates
• Text comparison
• Concordances (KWIC lists)
Frequency list
• A sorted list of words (word types) together with their frequency,
where frequency here usually means the number of occurrences in a
given corpus or document
• Sort order can be alphabetic (ascending or descending), by frequency,
z-score or tf-idf score
• The position in the list occupied by a single word is called rank
Collocates
• Collocates: the list of words that occurs more frequently near a given
word within a given context
• http://wordhoard.northwestern.edu/userman/analysis-collocates.html
• Useful to study linguistic phenomena like grammatical concordance
• It can give an insight to thematic aspects or to semantic clusters that
characterize a text
Text comparison
• Instead of using features to sort documents into categories, you start with
two categories of documents and contrast them to identify distinctive
features [Ted Underwood]
• Knowing that individual word forms in one text occur more or less often
than in another text may help characterize some generic differences
between those texts
• log-likelihood ratio (introduced in computational linguistics by Dunning) is a
common algorithm to assess the size and significance of the difference of a
word's frequency of use in the two texts.
• The log-likelihood ratio measures the discrepancy of the observed word frequencies
from the values which we would expect to see if the word frequencies (by
percentage) were the same in the two texts.
• The larger the discrepancy, the larger the value of, and the more statistically
significant the difference between the word frequencies in the texts.
• Simply put, the log-likelihood value tells us how much more likely it is that the
frequencies are different than that they are the same
Concordance
• A concordance is the list of the words (type) used in a text or a
corpus, listing every instance of each word with its immediate
linguistic context
• Historically concordance have taken two forms (but computational
system have opted for the first one)
• Kwic (Key Word In Context)
• Kwoc (Key Word Out of Context)
• Concordance is bridge between qualitative and quantitative analysis
of a text since
• it gives access to the actual segment of text containing the word
• its output remains a linguistic unit amenable to human interpretation
Tools for text analysis
• Local standalone tools
•
•
•
•
•
Ant-conc
Concordance
Monoconc
Word Smith
txm
• Web application and service
•
•
•
•
Wordhoard
Philologic
Voyant
Textgrid
Text mining
• Text mining refers generally to the process of extracting interesting
and non-trivial patterns or knowledge from unstructured text
documents
• The notion of unstructured must be taken with caution since no digital
information set can really be unstructured. We have better talk of level or
grade of structuration of the data
• Overall methods
• Supervised methods: text classification
• Unsupervised methods: clustering and topic modelling
Supervised: classification
• The categories are known a-priori
• A program can learn to correctly distinguish texts by a given author, or
learn (with a bit more difficulty) to distinguish poetry from prose, tragedies
from history plays, or “gothic novels” from “sensation novels”
• The researcher provides examples of different categories (training set), but
doesn’t have to specify how to make the distinction: algorithms can learn
to recognize a combination of features that is the “fingerprint” of a given
category
• After the training the algorithm can be applied to a wider non categorized
data set
• Many kinds: Naïve Bayes, decision trees / random forests, support vector
machines, neural networks, etc.
• No "best" one: performance is domain- and dataset-specific
Unsupervised: clustering and topic modelling
• a program can subdivide a group of documents using general
measures of similarity instead of predetermined categories. This may
reveal patterns you don’t expect
• Two kinds of unsupervised learning
• Single membership clustering: each document is assigned to one category ->
clustering
• Mixed membership clustering: a document may be assigned to multiple
categories, each with a different proportion -> topic modeling
The (meta)data model: text as
multidimensional vector or “bag of words”
• Here are two simple text documents:
• (1) Rose is a rose is a rose is rose.
• (2) A rose is a rose is a rose is an onion.
• Based on these two text documents, a list is constructed as:
• [ “rose", “is", “a", “an”, "onion" ] which has 5 distinct words.
• using the indexes of the list, each document is represented by a 4-entry vector:
• (1) [4, 3, 1, 0, 0]
• (2) [3, 3, 1, 1, 1]
• each entry of the vectors refers to count of the corresponding entry in the list
• For example, in the first vector (which represents document 1), the first two entries are “4,3".
The first entry corresponds to the word “Rose" which is the first word in the list, and its value
is “4" because “Rose" appears in the first document 4 time. Similarly, the second entry
corresponds to the word “is" which is the second word in the list, and its value is “3" because
it appears in the first document 3 times.
• This vector representation does not preserve the order of the words in the
original sentences
Topic modelling
• The hype of the moment!!!
• Topic models are algorithms for discovering the main lexical clusters
(themes??) that characterize a large collection of documents
• Topic models can organize the collection according to the discovered
topics
• Topic modeling algorithms can be adapted to many kinds of data.
Among other applications, they have been used to find patterns in
genetic data, images, social networks…
• Each document is modeled as a mixture of categories or topics
• A document is a probability distribution over topics
• A topic is a probability distribution over words
Various algorithms for topic modelling
• Latent Semantic Analysis
• Based on TF-IDF scores matrix of words and a linear algebra calculation
• Basically, the more often words are used together within a document, the
more related they are to one another
• Latent Dirichlet Analysis
• Based on a Bayesian probabilistic approach
• most used now!
LDA rationales
• A very simplistic
generative model for
text:
• a document is a bag
of topics
• a topic is a bag of
word
• LDA Buffett by Matt
Jockers
LDA rationales
• If I can generate a document using this model, I can also reverse the process and
infer, given any new document and a topic model I’ve already generated, what
the topics are that the new document draws from
• But if we start from a bunch of text with no previously defined topic? Here is the
trick:
• Step 1 You tell the algorithm how many topics you think there are. You can either use an
informed estimate (e.g. results from a previous analysis), or simply trial-and-error. In trying
different estimates, you may pick the one that generates topics to your desired level of
interpretability, or the one yielding the highest statistical certainty (i.e. log likelihood)
• Step 2 The algorithm will assign every word to a temporary topic. Topic assignments are
temporary as they will be updated in Step 3. Temporary topics are assigned to each word in a
semi-random manner (according to a Dirichlet distribution, to be exact). This also means that
if a word appears twice, each word may be assigned to different topics.
• Step 3 (iterative) The algorithm will check and update topic assignments, looping through
each word in every document. For each word, its topic assignment is updated based on two
criteria:
• How prevalent is that word across topics?
• How prevalent are topics in the document?
What’s in a topic?
• But what is a topic discovered by LDA topic modeling?
• This is something that the researcher must decide by mean of….
Interpretation!
• A theme (in the literary meaning)?
• A discourse (Underwood)?
• A sparse semantic cluster?
• Are apparently inconsistent topic interesting or are the demonstration of
the method failure to give insights for literary explanations?
• http://www.lisarhody.com/some-assembly-required/
• An open debate…
Sentiment analysis
• Giving a quantitative value to the emotional valence of sentences and text
by means of an emotional metric attributed to a set of lexemes
• Matt Jocker application for plot analysis: syuzhet controversy
• “In the field natural language processing there is an area of research known as
sentiment analysis or, sometimes, opinion mining. And when our colleagues engage
in this kind of work, they very often focus their study on a highly stylized genre of
non-fiction: the review, specifically movie reviews and product reviews. The idea
behind this work is to develop computational methods for detecting what we,
literary folk, might call mood, or tone, or sentiment, or perhaps even refer to as
affect. The psychologists prefer the word valence, and valence seems most
appropriate to this research of mine because the psychologists also like to measure
degrees of positive and negative valence”.
• “I discovered that fluctuations in sentiment can serve as a rather natural proxy for
fluctuations in plot movement”
• http://www.matthewjockers.net/2015/02/02/syuzhet/
Network analysis
• Network analysis, a set of methods and strategies for studying
the relationships inside groups of n-individuals, based on graph
theory.
• each individual constitutes a node and each relation an edge or arc
connecting two nodes
• the resulting network is a formal and highly abstract model of the group
internal relational structure
• some mathematical properties of the network can be computed and
used as proxy for qualitative aspects of the domain,
• network analysis is very appealing since it can be easily turned
into very attractive and (often) explanatory graphic visualizations
• https://dhs.stanford.edu/algorithmic-literacy/topic-networks-in-proust/
Network analysis
A network is made of vertices and edges; a plot,
of characters and actions: characters will be the
vertices of the network, interactions the edges,
and here is what the Hamlet network looks like:
….
… once you make a network of a play, you stop
working on the play proper, and work on a
model instead. You reduce the text to characters
and interactions, abstract them from everything
else, and this process of reduction and
abstraction makes the model obviously much
less than the original object […] but also, in
another sense, much more than it, because a
model allows you to see the underlying
structures of a complex object.
[Franco Moretti, Distant Reading]
Tools for distant reading
• Mallet - Topic Modeling Tool (java GUI for Mallet)
• R language with R packages (from Matt Jocker web site)
• Stanford Topic Modeling Toolbox
• Gensim (Python library)
A critical stance towards Distant reading
• Some critical and methodological reflections on the weakness of
massive quantitative methods
• Proxy fallacy
• one cannot use one measurement as a proxy for something else; rather, the
effectiveness of that proxy is assumed rather than actually explored or tested
in any way
• There are clear layers between the written word and its intended meaning,
and those layers often depend on context and prior knowledge. Further,
regardless of the intended meaning of the author, how her words are
interpreted in the larger world can vary wildly
• [Scott Weingart, The Myth of Text Analytics and Unobtrusive Measurement]
A critical stance towards Distant reading
• Data mining algorithms in general are independent from the context (they
can be applied indifferently to stock exchange transactions as to very large
textual corpora). They individuate similarities and recurring patterns
independently from the semantics of data. Humanities and literary data are
heavily contextualized
• Text mining are agnostic toward the granularity of the data to which they
are applied. Texts are only sequences of n-grams, and the probabilistic
rules adopted to calculate the relevance of a given set of n-grams are
completely independent from the fact that the units of analysis are
individual coded characters, or linguistic tokens of greater extension
• If a very large textual set is composed of documents spread over a long
period of time, diachronic variation of the form and usage of the language
(both on the syntactic and semantic level) can invalidate purely
quantitative and statistic measures
A critical stance towards Distant reading
• Data in literary studies do not precede formal modeling; on the contrary,
they are the product of modeling. It is very dubious to assume innocently a
data set as the starting point of a meaningful analysis
• Meaning in literary texts is multi-layered, and some layers do not have
direct lexicalization or they have a very complex and dispersed one (think
to aspects of a narrative text at different abstraction level like anaphora,
themes, plot and fabula, actants). Purely quantitative analysis apply only to
textual “degré zéro”, on which the secondary modeling systems of
literature builds their significance
• Texts are essentially intentional objects: the meaning of a word; the usage
of a metaphor; the choice of a metric or rhythmic solution in a poetic text
are determined by the attribution of sense and meaning by the author and
by the reader. Intentional phenomena do not follow regular pattern and
are hardly (if ever) detectable by statistical methods
The intensional nature of literary phenomena
• One of the underlying assumption of the distant reading approach is quite
analogous to the reductionist stance in cognitive sciences
• Interesting literary phenomena can be reduced without residues to
material linguistic phenomena, that in turn are completely accessible to
purely quantitative and statistic/probabilistic methods
• We can say that a purely quantitative approach to literary objects is
eliminativist towards the intentional concepts of critical discourse
• Interpretation is based on the production and application of a set of
intentional notions and terms to explain what the text means and how
• Semiotic and structural critics has tried to explain or to reformulate them
in more formal and abstract concepts that preserve the intentional nature
of text and interpretation
Semantic technologies and ontologies
• Semantic oriented approach is based on the modeling of complex
human interpretations and annotations of the data through formal
languages: creating and processing Rich Data
• Based on the concepts, frameworks and languages of Semantic Web
and Linked Data [Tim Berners-Lee]
• The convergence between semiotic/structuralist theories and
methods and contemporary ontologies and linked data oriented
practices represents a big chance for the future development of
Digital Literary Critics
• Building Rich Data for humanities research can enhance the efficacy
of text mining technologies
Formal ontologies
• In the context of computer and information sciences, an ontology
defines a set of representational primitives with which to model a
domain of knowledge or discourse [Gruber]
• A formal ontology is a formalized account of a conceptual description
of (portion of) the world
• The relevance of formal ontologies for literary and cultural objects
digital processing are both theoretical and operational
Why ontologies matter for (digital) humanists
• Creating formal models based on explicit conceptualization and
logical foundation grants that all the discourses are firmly grounded
to a common “setting” of the domain. We all (try) to speak of the
same thing.
• Formal ontologies permit the application of computational inferences
and reasoning methods to express explanation and make predictions.
Their grounding in description logic has made possible the
development of efficient automatic reasoner and inference engines.
• Semantic Web modeling provides methods to compare and
eventually merge different ontologies; the Open World Assumption,
ensures the functionality of the model even if it is incomplete or
conceived as a work in progress
Why ontologies matter for (digital) humanists
• In Humanities and Literary Studies conceptual formalization must face
the problem of the
• indeterminacy of theories
• vagueness of terms
• intrinsic ambiguity of the domain
• To use computing we need to reduce the implicit and formalize with
the consciousness that formal modeling is inside the hermeneutic
process
• Making ontologies and linking them to digital cultural artifacts build
knowledge:
• it asks for making explicit the tacit knowledge, which is a major part of
Humanities work
• it asks for finding the data level correlatives to the abstract and theoretical
notions that populate theories once they are formalized as ontologies
• An ontology, in the end, is an account of what the community knows
as much as of how it knows what it knows, to recall Willard McCarty
Annotation and ontological modeling:
examples
• Free form annotation
• http://www.annotating-literature.org/
• Catma
• Ontology driven geo-annotation
• GEOLAT
• Ontology for narrative texts
• Zöllner-Weber's Noctua literaria ontology, http://www.figurenontologie.de
• An ontology for narrative characters (Ciotti)
Toward an Hermeneutic Machine
• A digital environment and infrastructure incorporating semantic
methods and practices of digital interaction and cooperation
already available and tested in the Digital Humanities
community
• Networked infrastructure of resources, tools and services
• Multiple ontological modeling can be connected with the same
(passage of) text thus uncovering its complexity
• Such stratified texts can be re-used in different fruition
contexts, from “professional scholars” to culturally curious
users who are attracted by the potential text mash-ups
Toward an Hermeneutic Machine
• Main components:
• high-quality documents archives belonging to different linguistic traditions /
culture in standard encoding formats
• a set of methods and computational tools for distributed and cooperative
annotation of digital resources
• a set of domain specific shared ontologies organized in a multilayer design to
model particular aspects of the intra-, extra- and inter- textual structure:
•
•
•
•
•
•
•
•
•
real places and spaces chronologically adapted
real persons (including authors)
works and literary history categories
historical events
fictional places and worlds
fictional characters and entities
themes and motives
rhetorical figures
genres and stylistic features
• tools to visualize and process semantic levels of digital information and share
knowledge as linked data
Toward an infrastructure for a Literary
Semantic Web
• Building such an infrastructure is demanding task but many of the
building blocks are already there
• The history and evolution of the Web has shown that it is possible to
build complex systems through an incremental and cooperative process
• The infrastructure we are envisioning is cooperative but it cannot be
based on a crowdsourcing approach: we can rather call it a “competent
and motivated community” driven project
• The representation of beliefs and interpretations made by a scholar
depends on assumptions in common with a particular interpretive
community who shares methodologies, disciplinary practices and criteria
of rational acceptability; the community of experts licenses the correct
interpretations and by the way of the ontological modeling shapes the
frames in which interpretations occur
Readings…
• … other than the list of references I have already suggested
• Ted Underwood blog
• Matt Jocker Blog
• Scott Weingart Blog
• The Programming Historian
• Andrew Piper Blog
• … alle the references you can find from these! Go on explore!
Thank you!!!
[email protected]
https://www.facebook.com/Ciotti.Fabio
http://www.aiucd.it