Слайд 1 - ufal wiki

Download Report

Transcript Слайд 1 - ufal wiki

Leonid Iomdin
Institute for Information Transmission Problems,
Russian Academy of Sciences
[email protected], [email protected]
Program Overview: p. 1
 1. Basic Principles of The Meaning-Text theory by Igor
Mel’čuk. Language as a Universal Translator of Senses
to Texts and Texts to Senses. Text analysis and text
generation. The theory of integral linguistic
description by Juri Apresjan. The grammar and the
dictionary of language.
 2. Two syntactic levels of sentence representation:
surface syntax and deep syntax.
December 21, 2009. Lectures 13-14
2
Program Overview: p. 2
 3. The dependency tree structure as a syntactic
representation of the sentence. Dependency tree vs.
Constituent tree: advantages and drawbacks of both
types of representation. Limits of the dependency tree.
The hypothesis of two syntactic starts.
 4. The notions of syntactic relation. Major classes of
syntactic relations: actant, attributive, coordinative and
auxiliary relation classes.
 5. The notion of syntactic feature. Syntactic features vs.
Semantic features.
December 21, 2009. Lectures 13-14
3
Program Overview: p. 3
 6. Actants and valencies. Active, passive and distant
valencies. The government pattern of a dictionary entry.
An overview of actant syntactic relations. The predicative
relation. The agentive relation. Completive relations.
 7. An overview of attributive syntactic relations.
Grammatical Agreement. Numerals and Quantitative
Constructions. The system of Quantification Syntax of
Russian.
 8. Grammatical coordination as a type of grammatical
subordination. An overview of coordinative syntactic
relations.
December 21, 2009. Lectures 13-14
4
Program Overview: p. 4
 9. Auxiliary syntactic relations. Analytical grammatical
forms as an object of syntax.
 10. Microsyntax of Language. Minor Type
Sentences. Syntactic Idioms.
 11. Lexical Functions in the Dictionary and the
Grammar.
 12. Syntactic description and syntactic rules.
Dependency Syntax in NLP. Dependency Syntax in
Machine Translation. Syntactically Tagged Corpus of
Texts.
December 21, 2009. Lectures 13-14
5
Lexical Functions
 Substitute LF
 synonyms, antonyms, converse terms,
derivatives
 Collocate LF
 MAGN = 'a high degree of what is denoted by X’
 OPER/FUNC
 ...
December 21, 2009. Lectures 13-14
6
Lexical Functions: Magn
 MAGN (disease)
 MAGN (fog)
 MAGN (control)
= grave
= heavy
= strict
 MAGN (болезнь) = тяжелый
 MAGN (туман)
= густой
 MAGN (контроль) = строгий
December 21, 2009. Lectures 13-14
7
Lexical Functions:
Oper / Func Family
INVITATION
issues
1
the minister
receives
2
the ambassador
December 21, 2009. Lectures 13-14
8
Examples of LF Oper
 Oper1 (invitation) = issue
 Oper2 (invitation) = receive
 Oper1 (defeat) = suffer
 Oper2 (resistence) = encounter
 Oper2 (respect) = enjoy
December 21, 2009. Lectures 13-14
9
Examples of LF Func
 Func1 (fear) = possess
 Func2 (decision) = concern
 Func1 (responsibility) = rest (with)
 Func2 (vengeance) = fall (upon)
December 21, 2009. Lectures 13-14
10
General Properties
of Lexical Functions
 Universality
 Intralinguistic idiomaticity
 grave disease, heavy fog
 *heavy disease, *grave fog.
 Cross-linguistic idiomaticity
 Rus. tjazhelaja bolezn’ ‘heavy disease’
 Rus. gustoj tuman ‘dense fog’
December 21, 2009. Lectures 13-14
11
General Properties
of Lexical Functions (cont.)
 Paraphrasing Potential:
 He respects [X] his teachers
 He has [OPER1 (S0 (X))] respect [S0 (X)] for
his teachers
 He treats [LABOR12 (S0 (X))] his teachers
with respect
 His teachers enjoy [OPER2 (S0 (X))] his
respect
December 21, 2009. Lectures 13-14
12
LF in Practical Applications
 Syntactic and Lexical Ambiguity
Resolution in Parsers
 Idiomatic Translation of a Large Class
of Set Expressions in Machine
Translation
 Sentence Paraphrasing
December 21, 2009. Lectures 13-14
13
Lexical Ambiguity Resolution
 to draw a distinction - provodit' razlichie
 Both verbs are extremely ambiguous:
 draw - more than 50 meanings
 provodit’ - more than 10 meanings
December 21, 2009. Lectures 13-14
14
Syntactic Ambiguity Resolution
 support of the army
 'support by the army'
 'support (given) to the army'
 The president had [Y=OPER2(X)] the
support [X] of the army
December 21, 2009. Lectures 13-14
15
Syntactic Ambiguity Resolution
 The fear [X] of his wife possessed
[Y = FUNC1 (X)] Peter
 The fears of his wife infected Peter.
December 21, 2009. Lectures 13-14
16
Idiomatic translation: LF Temp
 March:
in
 Tuesday: on
 dawn:
at
 moment: at
 Easter: at
–
–
–
–
–
mart:
vtornik:
rassvet:
moment:
pasxa:
December 21, 2009. Lectures 13-14
v2
v1
na2
v1
na1
17
Sentence Paraphrasing
 X = CONV12 (X)
This group consists of 20 persons –
Twenty persons comprise this group;
 X + Y = ANTI1(X) + ANTI2(Y)
He began to observe the rules –
He stopped violating the rules
 X = LABOR12 + S0(X)
He respects his parents –
He treats his parents with respect
December 21, 2009. Lectures 13-14
18
ETAP-3 Options
1. Machine Translation
2. Deeply Annotated Text Corpus of Russian
(SynTagRus)
3. Translation System Based on UNL (Universal
Networking Language) Interlingua
4. Synonymous and Quasi-Synonymous
Paraphrasing of Utterances
5. Computer-Aided Language Learning Tool
6. New Developments: Semantics and Ontologies
December 21, 2009. Lectures 13-14
19
SynTagRus
Currently the treebank contains over 42,000 sentences (ca.
over 600,000 words) belonging to texts of a variety of
genres (contemporary fiction, popular science, newspaper
and journal articles dated between 1960 and 2009, texts of
online news etc.) and is steadily growing.
It is an integral but fully autonomous part of the Russian
National Corpus developed in a nationwide research
project. It can be freely consulted on the Web
(www.ruscorpora.ru).
December 21, 2009. Lectures 13-14
20
SynTagRus
Since Russian is a language with relatively free
word order, SYNTAGRUS adopted a
dependency-based annotation scheme, in a
way parallel to the Prague Dependency
Treebank (see e.g. Hajič et al. 2000).
December 21, 2009. Lectures 13-14
21
SynTagRus
December 21, 2009. Lectures 13-14
22
SynTagRus
What we have just seen is a screenshot of the
dependency tree for the sentence
(1) Наибольшее возмущение участников митинга
вызвал продолжающийся рост цен на бензин,
устанавливаемых нефтяными компаниями ‘It was the
continuing growth of petrol prices set by oil companies
that caused the greatest indignation of the participants
of the meeting’.
December 21, 2009. Lectures 13-14
23
SynTagRus
Here, nodes represent words (lemmas)
assigned morphological and part-of-speech
tags, whilst arcs are labeled with names of
syntactic links. The tagging uses about 75
syntactic links, half of them proposed in Igor
Mel’čuk’s Meaning  Text Theory (Mel’čuk
1988).
December 21, 2009. Lectures 13-14
24
SynTagRus
Normally, one token corresponds to one node
in the dependency tree. There are however a
noticeable number of exceptions.
The main types of exceptions include:
December 21, 2009. Lectures 13-14
25
SynTagRus
1) composite words like пятидесятиэтажный
‘fifty-storeyed’ where one token corresponds to two
or more nodes;
2) so-called phantom nodes for the representation of
hard cases of ellipsis which do not correspond to
any particular token in the sentence (cf. Я купил
рубашку, а он галстук ‘lit. I bought a shirt and he
a tie’, which is expanded into Я купил рубашку, а
он купилPHANTOM галстук ‘I bought a shirt and he
bought PHANTOM a tie’;
3) multiword expressions like по крайней мере ‘at
least’ where several tokens correspond to one node.
December 21, 2009. Lectures 13-14
26
SynTagRus
Morphological Tagging of SYNTAGRUS is based on a
comprehensive morphological dictionary of
Russian that counts about 130,000 entries (over 4
million word forms).
ETAP-3 morphological analyzer uses the dictionary
to produce morphological annotation of words
belonging to the corpus, which includes the
lemma, POS tags, and, depending on POS, a set of
morphological features.
December 21, 2009. Lectures 13-14
27
Syntactic Markup Language
The syntactic markup language of the
corpus is XML, because it is universally
accepted and because it satisfies certain
important requirements that the corpus
must meet:
December 21, 2009. Lectures 13-14
28
Syntactic Markup Language
1) the corpus must feature several layers of linguistic
data that can be extracted from the annotation
independently of each other;
2) it should be scalable and incrementable both
quantitatively and qualitatively so that new types of
information could be added easily;
3)it must be supplied by standard programming
means for text parsing, sophisticated search, and
conversion.
December 21, 2009. Lectures 13-14
29
Structure Editor
It is a complex software environment aimed at
1. automatic generation of morpho-syntactic and
lexical functional annotation of texts,
2. manual editing of annotation results, and
3. fully manual annotation.
Automatic generation is only possible for texts in
natural languages that are supported by the ETAP-3
linguistic processor .
December 21, 2009. Lectures 13-14
30
Structure Editor
 In principle, Structure Editor is not languagespecific and can be used for annotation of texts in
any natural language, primarily one with rich
morphology.
December 21, 2009. Lectures 13-14
31
Structure Editor
 StrEd allows the annotator to use diverse dialog
interfaces in order to
1. view the whole text;
2. view a sentence as a table in which every line
corresponds to a particular word of the sentence;
3. view the syntactic dependency tree for a sentence;
4. to view information on a particular word of the
sentence;
5. view the discrepancies within the results of
automatic tagging and manual tagging of a sentence.
December 21, 2009. Lectures 13-14
32
Structure Editor
StrEd view presenting the sample text at an initial
stage with no morphosyntactic tagging performed.
December 21, 2009. Lectures 13-14
33
Structure Editor
 As a rule, the first step of text annotation is
automatic tagging. After it is obtained, the
sentences are revised by the annotator, who detect
and corrects the errors. To conveniently view the
dependency tree structure and manipulate with it,
Edit Structure dialog can be used.
December 21, 2009. Lectures 13-14
34
Structure Editor
December 21, 2009. Lectures 13-14
35
Structure Editor
 In this view, the annotator can perform all typical
actions that modify the original tagging; in particular,
the editor can rearrange the structure or delete the
syntactic relations by simple mouse gestures, alter the
lemmas, syntactic links, or grammatical features.
 If these operations do not suffice to obtain the
desirable results, the annotator may continue the
editing by switching to another dialog, intended for
sentence properties viewing and manipulation, which
allows performing less typical operations with the
sentence.
December 21, 2009. Lectures 13-14
36
Structure Editor
December 21, 2009. Lectures 13-14
37
Morpho-syntactic annotation
Петр крепко спит
<S ID="1" >
<W DOM="3" EXTRAFEAT="CAP" FEAT="S ЕД МУЖ ИМ
ОД" ID="1" KSNAME="ПЕТР" LEMMA="ПЕТР“
LINK="предик"> Петр</W>
<W DOM="3" FEAT="ADV" ID="2" KSNAME="КРЕПКО"
LEMMA="КРЕПКО" LINK="обст">крепко</W>
<W DOM="_root" EXTRAFEAT="ЛИЧ"
FEAT="V НЕСОВ НЕПРОШ ИЗЪЯВ 3-Л ЕД" ID="3"
KSNAME="СПАТЬ" LEMMA="СПАТЬ">спит</W> </S>.
December 21, 2009. Lectures 13-14
38
Sentence of average complexity
Пчелиные ульи и муравьиные колонии служат хорошим
примером: несмотря на относительную простоту организма
отдельных насекомых и незначительные возможности их
мозга, образуемый ими социум представляет собой весьма
сложную систему, отличающуюся исключительной
прочностью и слаженностью функционирования.
Beehives and ant colonies serve as a good example: despite a
relative simplicity of the body of individual insects and
insignificant potentials of their brains, the social medium
formed by them is a very complex system which is distinguished
by exceptional strength and harmony of functioning.
December 21, 2009. Lectures 13-14
39
Morpho-syntactic annotation
December 21, 2009. Lectures 13-14
40
Lexical Functional Annotation
 The newest version of SYNTAGRUS contains partial
lexical functional annotation: for collocations that
could be presented with the apparatus of lexical
functions, the tagging includes information on values
and attributes of such lexical functions.
December 21, 2009. Lectures 13-14
41
Lexical Functional Annotation
December 21, 2009. Lectures 13-14
42
Lexical Functional Annotation
December 21, 2009. Lectures 13-14
43
Lexical Functional Annotation
 Lexical functional annotation of a corpus sentence can be
1.
2.
3.

produced in three ways:
automatically, together with syntactic parsing by running
the ETAP-3 parser on the sentence;
automatically, by running a subset of ETAP-3 rules on the
ready syntactic structure of the sentence approved by the
expert; using the StrEd option “Let ETAP find them
(LFs)”,
manually.
The list of LF argument and values, irrespective of the way
it was produced, can be manually edited: information on
functions can be modified, added, or removed.
December 21, 2009. Lectures 13-14
44
Annotation Tools
 Considering the significant size of SYNTAGRUS (over
500,000 words ) the annotation process has to be
automated to the fullest extent possible.
 On the other hand, automatic annotation has to allow
for verification and, if need be, correction by a human
expert.
 This means that the environment has to provide for
comfortable viewing and editing of annotated texts.
December 21, 2009. Lectures 13-14
45
Intellectual Debugger
In order to diagnose nontrivial annotation errors, a
powerful instrument, Intellectual Debugger
(IntelDeb), was specially created to verify, in one
quick step, whether the current syntactic annotation
of a sentence (probably the result of several human
interventions) is compatible with at least one of the
parsing in principle achievable through the automatic
ETAP-3 parser.
December 21, 2009. Lectures 13-14
46
Intellectual Debugger
IntelDeb can be considered as a specific parser which,
unlike the regular ETAP parser, does not produce
multiple parses of a sentence. Instead, if the IntelDeb
finds that the structure being subject to verification is
inadmissible, its goal is to diagnose the cause, or
causes, of the situation as precisely as possible.
December 21, 2009. Lectures 13-14
47
Intellectual Debugger
 The underlying idea is to run the parser consecutively
on all binary subtrees as presented by the annotation
and see whether the existing syntactic rules and
dictionaries permit the construction of such subtrees.
The algorithm checks all rules with regard to a specific
syntactic link (there may be dozens of such rules and
all possible lemmas for the given pair of words,
starting with the rules and lemmas cited in the
annotation but gradually loosening the grip and
resorting to other rules and lemmas if the current
choice cannot be confirmed.
December 21, 2009. Lectures 13-14
48
The Hypothesis of Two Syntactic Starts
We will be dealing with a special type of
sentences with embedded
(semi-)phraseological expressions like
He does the Devil knows what or its
Russian equivalent Он занимается
чёрт знает чем.
December 21, 2009. Lectures 13-14
49
The Hypothesis of Two Syntactic Starts
It is very difficult to build adequate
syntactic representations for such
sentences. A controversial solution is
proposed for this problem, admitting
that sentences of this type have two
syntactic starts, or syntactic heads.
December 21, 2009. Lectures 13-14
50
Problem
(1) Он занимается чёрт знает чем
(2) He does the Devil knows what
(3) Мне было – так лестно / Лезть за
тобою – Бог / Знает куда! (Marina
Tsvetayeva)
(4) I felt so flattered to climb after you God
knows where
December 21, 2009. Lectures 13-14
51
References
Haspelmath, Martin. Indefinite pronouns.
Oxford Studies in Typology and Linguistic
Theory. Oxford: Oxford University Press, 1997.
Lakoff, George. Syntactic Amalgams. // Papers
from the 10th Meeting of the Chicago
Linguistic society, 1974, pp. 321-344.
December 21, 2009. Lectures 13-14
52
References
Testelets Y., E. Bylinina. Sluicing-Based Indefinites in
Russian. // Formal Approaches to Slavic Linguistics
13: The South Carolina Meeting. Ann Arbor, MI:
Michigan Slavic Publications. 2005, 355-364.
Апресян, Ю.Д., Иомдин Л.Л. Конструкции типа
НЕГДЕ СПАТЬ в русском языке: синтаксис и
семантика. (Constructions of the NEGDE SPAT'
type in Russian: Syntax and semantics.) Semiotika i
informatika, No. 29. Moscow, 1990, pp. 3-89.
December 21, 2009. Lectures 13-14
53
Why is it difficult to build adequate surface
syntactic representations for these sentences?
Because it is unclear what the syntactic role of
the verb знать or know in (1)-(4).
This verb cannot be the absolute head of the
surface syntactic tree as in
(1) Один чёрт знает, чем он занимается or
(2) The devil only knows what he does
where знает and knows are the tops of the
trees.
December 21, 2009. Lectures 13-14
54
Indeed, if we compare (2) and (2)
(2) He does the Devil knows what
(2) The devil only knows what he does
we will see that (2) is neither syntactically nor
semantically equivalent to (2):
John only knows what he does
*He does John knows what
(2), in contrast to (2), expresses disapproval, negative
attitude of the speaker toward the subject and his
activity
December 21, 2009. Lectures 13-14
55
There is no reasonable syntactic
governor for knows in (1) and (2).
If we subordinate it to the main
verb of the sentence we shall face
the problem of what the syntactic
relation between the verbs is.
December 21, 2009. Lectures 13-14
56
 We might view the syntactic governor
of knows in the pronoun where.
Phraseological expressions like devil
knows may be suspected of having
transformed into merged lexical units
equivalent to indefinite particles like
–ever.
December 21, 2009. Lectures 13-14
57
 Such a solution does not hold, since the
embedded constructions of this type
are not confined to phraseological
expressions cited and may include
rather free clauses formed with
different verbs.
December 21, 2009. Lectures 13-14
58
 Когда я был подростком, сильное
впечатление на меня произвела
вычитанная не помню уже в какой
книге история панамской авантюры.
‘When I was a youth I was deeply impressed
by the story of the Panama adventure that I
read in I don’t remember which book’
(Novoye Vremya)
December 21, 2009. Lectures 13-14
59
 Even the second parts of these
constructions are not necessarily
interrogative pronominal words. They
may be represented, in Russian, by
conjunction или ‘or’ or the particle ли
‘whether’
December 21, 2009. Lectures 13-14
60
 Его судят за преступление, которое
он неизвестно совершил или нет
lit. ‘He is being tried for a crime which
it is not clear if he committed or not’
December 21, 2009. Lectures 13-14
61
 Кроме того, есть еще такие сдерживающие
факторы, как наличие Северной Кореи с
непонятно имеющимся ли у нее ядерным
оружием
‘Besides, there are such deterrent factors as the
presence of North Korea with nuclear weapons
that it might or might not have’ lit. ‘… the presence
of North Korea with it-is-unclear-whetheravailable-to-it nuclear weapons’
December 21, 2009. Lectures 13-14
62
 Whilst there is no evident syntactic
governor for the second verbs of the
sentences considered, the pronominal
words have as many as two plausible
candidates for governor.
December 21, 2009. Lectures 13-14
63
(2) He does the Devil knows what
December 21, 2009. Lectures 13-14
64
 Оn the one hand, one may suggest that what
чем instantiates the 1st completive valency
of do.
 In the Russian example (1)заниматься, it is
the only word of sentence (1) that stands in
the instrumental case – exactly the one that
is required by заниматься.
December 21, 2009. Lectures 13-14
65
 On the other hand, the same
pronominal word may be viewed as
instantiating the 1st completive valency
of the verb know, the way it does in
isolated (elliptic) sentences like I know
what.
December 21, 2009. Lectures 13-14
66
So, the syntactic structure of (1) has
two oddities at a time: one word in
need of a syntactic parent (know)
has no good candidate while
another word (what) has two.
December 21, 2009. Lectures 13-14
67
Solution
 The duality of syntactic dominance for what in
(2) is far from trivial and requires further
reasoning. In simple single-clause sentences
pronominal words like what cannot depend on
verbs that, unlike know, do not take propositional
complements:
 *I do what
December 21, 2009. Lectures 13-14
68
Such pronouns may either form a special
question like What do you do? – in which
case the pronoun is interrogative too.
In Russian, there can also be a highly
colloquial general question like Вы
занимаетесь↑ чем? ‘Do you do anything?’
where чем is an indefinite pronoun and
really means ‘anything’
December 21, 2009. Lectures 13-14
69
Assuming that (2) is not a single-clause
sentence, we should define what clauses it
may consist of.
The most natural assumption is that (2)
consists of two clauses, one constituted by
verb does and the other constituted by verb
knows.
December 21, 2009. Lectures 13-14
70
Where are the boundaries of the two
clauses? The left-hand boundaries of both
clauses are evident: for the first clause it is
the beginning of the whole sentence and
for the second clause it is the word devil
which is the subject of the verb knows.
December 21, 2009. Lectures 13-14
71
Hypothesis: the right-hand boundaries of
both clauses are the same and coincide
with the end of the sentence, so that the
pronominal word what belongs to both
clauses.
December 21, 2009. Lectures 13-14
72
If we now compare (2) with
(5) John know what he does,
we will see that
December 21, 2009. Lectures 13-14
73
 the lack of such subordination
distinguishes the second clause of (2) from
the subordinate clause of (5). The head of
the second clause of (2) remains without a
syntactic parent at all. This is the most
crucial characteristic of this type of
sentences.
December 21, 2009. Lectures 13-14
74
 Sentences (5) and (2) are unfolding differently:
 (5) is smoothly produced by the speaker,
 (2) has a sort of leap amidst generation: before the
first clause is finished, the second clause starts to
evolve, and, after some time, the two proceed
together until the end of the whole sentence.
December 21, 2009. Lectures 13-14
75
 The second clause in (2) behaves like
a tributary to a river, which
contributes to its course.
December 21, 2009. Lectures 13-14
76
Evolution of sentence (2) resembles the correlation
between the main and the parenthetical clauses if
the latter is situated in the middle of the sentence, as
in
(6) At this moment a young man (this was John) rose
from his place’
December 21, 2009. Lectures 13-14
77
 The drastic difference between (6) and (2)
is that parenthetical clauses are finished
sooner than the main clauses while in (2)
the “tributary” clause ends together with
the first clause.
December 21, 2009. Lectures 13-14
78
 If this stand is taken, we will have to
admit that sentences of this type have
two syntactic starts.
December 21, 2009. Lectures 13-14
79
 They violate the fundamental requirement
of the surface syntactic component of the
Meaning  Text theory that the syntactic
structure of any sentence should be a tree.
December 21, 2009. Lectures 13-14
80
Discussion
 One more syntactic particularity is that, in
Russian, expressions like чёрт знает что
may include a personal pronoun whose
syntactic status is unclear
December 21, 2009. Lectures 13-14
81
 Ему давно уже пора дом покупать, снимает чёрт
его знает что!
lit. ‘It’s high time he buys a house, he rents the Devil
knows him what’ (Alexander Torin, Gelikon Plus, St.
Petersburg, 2000);
December 21, 2009. Lectures 13-14
82
 Деньги уходят чёрт их знает куда
lit. ‘Money goes the devil knows it where’
(Vladimir Lenin, in a letter to his mother,
1895).
December 21, 2009. Lectures 13-14
83
 The constructions discussed are
subject to rather tight lexical
restrictions.
December 21, 2009. Lectures 13-14
84
 Within the phraseological subset, the
constructions are formed with the verbs знать
and, occasionally, ведать ‘know’, almost always in
the present tense, whose subjects can be either
 1) nouns чёрт, дьявол ‘devil’, леший ‘wood
goblin’, бес and бис ‘demon’, шут ‘jester’ and пёс
‘dog’ (the last two are probably euphemisms for
чёрт), practically always in the singular
December 21, 2009. Lectures 13-14
85
 2) derogatory nouns like фиг or хрен that are in
fact euphemisms for an obscene word, as in В
стране скоро фиг знает что начнется ‘Soon,
goodness knows what will start in this country’, or
this obscene word itself
 3) nouns Бог ‘God’, Господь ‘Lord’, Аллах ‘Allah’,
Всевышний ‘Almighty’, as in Mне не нравится,
что на юбилей города приглашают Бог знает
кого ‘I don't like it that they invite God knows
whom to attend the city anniversary’.
December 21, 2009. Lectures 13-14
86
 Первая корректура ушла из
издательства Будда знает сколько
времени назад
lit. ‘The first proof-sheet left the publisher
Buddha knows how long ago’ (from a
posting about the publication of a
manuscript on East Asia).
December 21, 2009. Lectures 13-14
87
 The semantics of the Devil knows what
type of construction is very interesting and
deserves special attention and careful study.
December 21, 2009. Lectures 13-14
88
 The meanings of collocations that represent
the construction are remarkably close to
each other. All of them have a strong
evaluative component that expresses the
speaker’s negative attitude toward the
participant or circumstance of the situation
conveyed by the collocations.
December 21, 2009. Lectures 13-14
89
There is a noticeable difference of meaning between
the variety of collocations based on God and the
remaining collocations.
In the former, the speaker’s negative attitude
becomes milder and is substituted by regret and,
possibly, compassion. To my mind, the speaker’s
negative attitude belongs to the assertive part of
the meaning rather than the presupposition. In
particular, this may account for the fact that
sentences like
#He betrayed the devil knows whom’
are infelicitous:
December 21, 2009. Lectures 13-14
90
in all probability, the semantics of the verb
betray  ‘be disloyal to’ requires that its
object deserve loyalty and the collocation
Devil knows who introduces an unknown
and/or bad person who does not deserve
loyalty.
December 21, 2009. Lectures 13-14
91
The construction considered here has a clear
negative trend. As a matter of fact, expressions like
Devil knows what, God knows where etc.) introduce
unknown entities.
He went God knows where really means the same as
Nobody knows where.
December 21, 2009. Lectures 13-14
92
At least some of the collocations that represent the
construction lack compositionality. An example is the
expression containing the Russian word сколько or its
English equivalent ‘how much’:
sentences like
Он получил чёрт знает сколько денег
‘He got the devil knows how much money’
refer to situations that involve an indefinitely large
amount of money but never to situations that involve an
indefinitely small amount of money.
December 21, 2009. Lectures 13-14
93
 The constructions considered here are unique and have no
close cognates in the language.
 In particular, the constructions like Иди куда хочешь
<куда тебе угодно> ‘Go wherever you please’, Oн
танцует с кем попало ‘He would dance with the first
person he comes across’, Ребенок ест что ни попадя ‘The
child eats whatever comes to hand’ that share with our
constructions the presence of interrogative pronouns and
the meaning of indefiniteness are nonetheless drastically
different from them.
December 21, 2009. Lectures 13-14
94
Most importantly, they do not have
an additional syntactic start.
December 21, 2009. Lectures 13-14
95
Microsyntax of Language
 Microsyntax of Language. Minor Type
Sentences. Syntactic Idioms.
December 21, 2009. Lectures 13-14
96
Syntactic Idioms
Syntactic phrasemes are idiomatic
units that have syntactic
particularities not shared by
common non-idiomatic expressions.
The term “syntactic phraseme” was
introduced in [Boguslavsky-Iomdin
1982].
December 21, 2009. Lectures 13-14
97
Syntactic Idioms
The term has been frequently used by Igor A.
Melčuk.
Jackendoff (1997) uses the term syntactic idiom.
He focuses on the presence of variable parts in the
syntactic idiom (like The hell with X or Russian Zу не до X-a – мне не до смеху ‘I am past
laughter’).
Jackendoff, Ray. Twisting the Night Away. //
Language, Vol. 73 (1997), pp. 534–559.
December 21, 2009. Lectures 13-14
98
Syntactic Idioms
What place in the general syntactic system
of language is claimed by syntactic idioms?
December 21, 2009. Lectures 13-14
99
Syntax and Microsyntax
The general syntactic system of the language can
in fact be divided into two unequal parts, or
“two syntaxes” :
the basic syntax of language, which embraces a
comparatively small number of
basic
constructions;
the peripheral syntax, which has a much
greater number of constructions.
December 21, 2009. Lectures 13-14
100
Syntax and Microsyntax
Basic constructions are frequent, non-idiomatic,
and built by very general grammar rules.
Every one of the peripheral syntactic
constructions is encountered in the text much
less frequently than any basic one, although
their overall occurrence is very high. These
latter constructions are varied and extremely
difficult to incorporate into the general system
of syntax.
December 21, 2009. Lectures 13-14
101
Syntax and Microsyntax
The part of the syntax constituted by
peripheral constructions is sometimes
referred to as “minor type sentences”.
I propose to use the term “microsyntax” to
account for this part of syntax.
December 21, 2009. Lectures 13-14
102
Syntax and Microsyntax
This division has nothing to do with greater
or lesser importance of any of the two
portions of the syntax. The reason is that the
study of peripheral linguistic structures
requires much more individual and fine tools
than that of basic structures.
December 21, 2009. Lectures 13-14
103
Syntax and Microsyntax
Microsyntax consists of objects of two
main types:
nonstandard syntactic constructions;
syntactic idioms.
December 21, 2009. Lectures 13-14
104
Syntax and Microsyntax
The boundary between these objects is not
very distinct. The main discriminating
criterion is the degree of lexicalization.
December 21, 2009. Lectures 13-14
105
Nonstandard Syntactic Constructions
Russian modal impersonal constructions with
an infinitive and a dative:
Z-у X-овать ‘Z is in for X’
Тебе выходить на следующей ‘you must get
off at the next stop’
Хозяйке всю ночь посуду мыть ‘The hostess
is in for a night of dishwashing’
December 21, 2009. Lectures 13-14
106
Nonstandard Syntactic Constructions
Russian modal impersonal constructions with an
infinitive, a dative, and a negation:
Z-у не X-овать ‘There is no chance that Z will do X’
Этому не бывать ‘This will never happen’
Не видать тебе золота, покамест не достанешь
крови человеческой! (Н.В.Гоголь) ‘You will never
see gold until you procure human blood’ (Nikolai
Gogol)
December 21, 2009. Lectures 13-14
107
Nonstandard Syntactic Constructions
Coordinative constructions with lexically
identical elements:
ну упал и упал lit. he fell and fell  ‘his fall
seemed to have no dramatic consequences’
бывают аварии и аварии lit. there are
accidents and accidents  ‘different accidents
take place’
December 21, 2009. Lectures 13-14
108
Nonstandard Syntactic Constructions
Coordinative constructions with lexically
identical elements:
сказал, что его зовут так-то и так-то 
he said that his name is so and so (he gave
one name and not two);
надо сделать то-то и то-то  we have to
do this and this (probably only one thing is
to be done)
December 21, 2009. Lectures 13-14
109
Nonstandard Syntactic Constructions
Vocative construction with lexically identical
elements:
Вась, а Вась ‘Vasya, oh Vasya’  ‘Vasya, can
you hear me’
Иван Иваныч, а Иван Иваныч ‘Ivan
Ivanovich, oh Ivan Ivanovich’
December 21, 2009. Lectures 13-14
110
Nonstandard Syntactic Construction or
Word Sense?
A curious lexical phenomenon in Russian
associated with the word быть: in the future
tense (буду, будешь etc) it is equivalent to
буду есть or буду пить ‘I will eat’ or ‘I will
drink’:
December 21, 2009. Lectures 13-14
111
Nonstandard Syntactic Construction or
Word Sense?
Я не буду кашу ‘I will not eat porridge’
Ты что будешь? ‘What will you have?’
*Я не был кашу, *Ты что был?, *Я не кашу,
*Ты что?
December 21, 2009. Lectures 13-14
112
Nonstandard Syntactic Construction or
Word Sense?
In no case could such expressions be
considered as ellipsis, because they do
not require any pre-text in which a verb
like есть or пить occurs.
December 21, 2009. Lectures 13-14
113
Nonstandard Syntactic Construction or
Word Sense?
Further, these expressions obey very
specific semantic restrictions: only
words denoting food or drinks (plus
pronouns) could be used with буду.
December 21, 2009. Lectures 13-14
114
Nonstandard Syntactic Construction or
Word Sense?
Accordingly, one cannot say something like *Я
буду аспирин ‘I will have an aspirin’ even
though the normal Russian verbs to be used
with the name of a medicine are пить or
выпить:
Выпей таблетку аспирина ‘Take a pill of
aspirin’,
Она всегда пьет аспирин, когда у нее болит
голова ‘She always takes aspirin when she
has a headache’
December 21, 2009. Lectures 13-14
115
Nonstandard Syntactic Construction or Word
Sense?
Interestingly, this construction can only refer
to an actual event of eating or drinking:
Поедем на Кавказ, будем пить вино
‘We will travel to the Caucasus and will drink
wine”
but never
*Поедем на Кавказ, будем вино
December 21, 2009. Lectures 13-14
116
Nonstandard Syntactic Construction or Word
Sense?
This means that the construction can only be
used in the sense of the immediate future.
December 21, 2009. Lectures 13-14
117
Nonstandard Syntactic Construction
or Word Sense?
Additionally, the construction normally refers
to a situation where food or drink is offered
by someone and taken by somebody else.
So it would be common to say something like
мы будем кофе и рогалики ‘we will have
coffee and rolls’ when addressing to a waiter
or accepting his offer but totally
unacceptable when the company in a café
discusses their menu: *давай будем кофе и
рогалики. ‘let’s have coffee and rolls’
December 21, 2009. Lectures 13-14
118
Nonstandard Syntactic Construction or Word
Sense?
One cannot imagine that someone says вот
увидишь, он будет кофе ‘you’ll see, he will
be (having) coffee’ when predicting the
behavior of a person sitting alone in his
kitchen without anyone waiting on him.
December 21, 2009. Lectures 13-14
119
Nonstandard Syntactic Construction!
My solution is that it is a construction
rather than a word sense because one
has to note too many things to
postulate a word sense of the verb
быть.
December 21, 2009. Lectures 13-14
120
Syntactic Idioms
Z-у не до X-a ‘Z is past X, Z is in no mood for
X’  ‘Z is busy with more important things
than X and Z believes that X can be
disregarded’:
Here, two elements are lexically bound: не
‘not’ and до ‘up to’
December 21, 2009. Lectures 13-14
121
Syntactic Idioms
руки чешутся (сделать что-л.) ‘one’s fingers
are itching (to do smth)
У меня руки чешутся побить его ‘My fingers
itch to give him a thrashing ‘
December 21, 2009. Lectures 13-14
122
What will come next
I will be considering a polysemous Russian adverbial
syntactic idiom ВСЁ РАВНО:
всё равно 1  ‘all the same’; as in Я всё равно сижу
дома ‘I am staying at home all the same’;
все равно 2 ‘makes no difference’, as in Нам всё
равно, куда ехать ‘We don’t care where we’ll be
going’;
всё равно 3  ‘tantamount’; as in Сняться в плохом
фильме — всё равно что плюнуть в вечность ‘To
star in a bad movie is equivalent to spitting into
eternity’.
December 21, 2009. Lectures 13-14
123
Syntactic phrasemes всё равно
Two fixed lexical elements
Three clearly discernible senses
December 21, 2009. Lectures 13-14
124
Syntactic phrasemes всё равно
None of these units can be considered a
nonsyntactic idiom because every one of
them has syntactic and combinatorial
properties not shared by any other lexical
units of Russian.
December 21, 2009. Lectures 13-14
125
Identification of Syntactic Idioms
Identification of a syntactic idiom in the
text is a serious problem
December 21, 2009. Lectures 13-14
126
Identification of Syntactic Idioms
Соглашаться на всё равно как и не соглашаться
ни на что – одинаково неприемлемые решения
‘To agree to everything, like not to agree to anything
are equally unacceptable solutions’
Почти всё равно нулю
‘Almost everything is equal to zero’
Не всё ли тебе равно?
‘Isn’t it all the same to you?’
December 21, 2009. Lectures 13-14
127
Identification of Syntactic Idioms
Он работает в одиночку. ‘He works
alone’
Он шел в одиночку. ‘He was going alone’
vs. ‘He was going to a solitary cell’
Он влюбился в одиночку. ‘He fell in love
with a single mother’
December 21, 2009. Lectures 13-14
128
Identification of Lexical Units
Он что-то знает ‘He knows something’.
Что-то он теперь поделывает? ‘I
wonder what he is doing these days’
December 21, 2009. Lectures 13-14
129
Description of a Syntactic Idiom
The full description of the syntactic behavior of a
syntactic idiom must include:
(1) lexical and morphological identification of
the constituents;
(2) identification of syntactic relations obtaining
between the idiom’s constituents, and their
direction;
(3) determination of syntactic peculiarities that
ensure the interaction of the idiom with other
elements of the sentence.
December 21, 2009. Lectures 13-14
130
Syntactic Idioms всё равно
The lexical and morphological
identification of all three lexical units
of the idiom vocable is the same: they are
composed of the noun всё ‘all’ in the
nominative singular and the adjective
равный in the short form singular
neuter.
December 21, 2009. Lectures 13-14
131
Syntactic Idioms всё равно
Мне всё безразлично lit. to me everything is
indifferent ‘All is the same to me’
Мне всё равно ‘It’s all the same to me’
There is no subject in the idiom but it can be
added:
Мне это всё равно‘This is all the same to me’
Мне всё равно, куда он пойдет ‘I don’t care
where he will go’
Мне всё всё равно lit. all is all the same to me
December 21, 2009. Lectures 13-14
132
Syntactic Idioms всё равно
The type of the syntactic relation that should be
postulated between the syntactic head and the
syntactic daughter is not predicative.
December 21, 2009. Lectures 13-14
133
Syntactic Idiom всё равно 1
всё равно 1 is a sentential adverb
Its behavior is the same as that of
nonidiomatic sentential adverbs like
наверняка ‘surely’, непременно
‘certainly’, точно ‘definitely’, напрасно
’for nothing’.
Usually, it depends on the sentence head –
a finite verb or an infinitive:
December 21, 2009. Lectures 13-14
134
Syntactic Idiom всё равно 1
Всё равно я его люблю ‘I love him all the
same’
Тебе всё равно вставать рано ‘You will
have to get early in any case’
Он всё равно хороший ‘He is good all the
same’
December 21, 2009. Lectures 13-14
135
Syntactic Idiom всё равно 1
Всё равно 1 cannot accept any syntactic
dependents, even particles:
*Не всё равно я его люблю ‘I love him not all the
same’
*Тебе совершенно всё равно вставать рано
‘You will have to get early in perfectly any case’
*Он почти всё равно хороший ‘He is good
almost all the same’
.
December 21, 2009. Lectures 13-14
136
Syntactic Idiom всё равно 1
Elements of всё равно 1 have a fixed
order and cannot be penetrated by
any other words.
December 21, 2009. Lectures 13-14
137
Syntactic Idiom всё равно 1
Of the three idioms, всё равно 1 has
advanced the most toward the single
word. The only notable distinction is
phonetic and prosodic (two accents,
nonreduced [o] in the element всё
December 21, 2009. Lectures 13-14
138
Syntactic Idiom всё равно 2
всё равно 2 is a predicative adverb.
It resembles other predicatives like жаль ‘a pity’.
The syntactic role played by всё равно 2 in the
sentence is that of a part of the predicate, the
other part of which is represented by a copula:
:
Ему было <стало, оказалось> всё равно) ‘It
was <came to be> all the same to him’
December 21, 2009. Lectures 13-14
139
Syntactic Idiom всё равно 2
Всё равно 2 has the same set of syntactic
features as predicate words like
интересно  ‘I wonder’, любопытно  ‘I
am curious’
December 21, 2009. Lectures 13-14
140
Syntactic Idiom всё равно 2
 feature “predqu” that represents a word’s
ability to accept a subject clause (an
indirect or an alternative question):
 Ей было всё равно, куда идти <получит
ли она место, придет он или нет, чтó
будет на обед> ‘It was all the same to her
where to go <whether he comes or not,
what she will have for dinner>’
December 21, 2009. Lectures 13-14
141
Syntactic Idiom всё равно 2
 feature “predthat” that represents a
word’s ability to accept a subject clause
introduced by the conjunction что
‘that’:
 Ей было всё равно, что ребенок устал
и хочет спать ‘It was all the same to her
that the child was tired and sleepy’
December 21, 2009. Lectures 13-14
142
Syntactic Idiom всё равно 2
всё равно 2 subcategorizes a noun in the
dative which implements the idiom’s
subject valency as it expresses the
subject of the state. This subject need
not be human but it must be a
volitional thing:
December 21, 2009. Lectures 13-14
143
Syntactic Idiom всё равно 2
Дамы здесь ни при чем, дамам это всё равно,
– отвечал пират, буквально сжигая
швейцара глазами, – а это милиции не
всё равно! (М. Булгаков, Мастер и
Маргарита). The ladies have nothing to do
with it, it is all the same to the ladies… but it
is not all the same to the police
December 21, 2009. Lectures 13-14
144
Syntactic Idiom всё равно 2

Elements of the idiom also have the
fixed order but under certain conditons
(in the negative general question) may
be intertwined by several other words:
 Не всё ли тебе равно, чтó со мной
будет? ‘Isn’t it all the same to you what
will become of me’?
December 21, 2009. Lectures 13-14
145
Syntactic Idiom всё равно 2

In these sentences, some words may
depend on the syntactic daughter of
the idiom rather than its syntactic
head.
December 21, 2009. Lectures 13-14
146
Syntactic Idiom всё равно 3
всё равно 3 is a predicative adverb , too.
However, its syntactic properties are
extremely idiosyncratic and do not
seem to have close analogies to other
lexical units of Russian.
December 21, 2009. Lectures 13-14
147
Syntactic Idiom всё равно 3
всё равно 3 subcategorizes a conjunction что
‘that’ or как ‘as’:
Никогда не следует сожалеть, что
человека обуревают страсти. Это всё
равно, как если бы мы стали сожалеть,
что он человек ‘One should never regret
that man is passionate. This is equivalent
to our regretting that he is man’
December 21, 2009. Lectures 13-14
148
Syntactic Idiom всё равно 3
всё равно 3 is the part of the predicate alongside
the copula.
However, it imposes constraints on the subject
which can only be a nomen actionis, the
pronoun это ‘this’ or an infinitive. In the
latter case, the conjunction must be followed
by another infinitive so that the sentence
become a bi-infinitive one.
December 21, 2009. Lectures 13-14
149
Syntactic Idiom всё равно 3
In contrast to всё равно 2,
всё равно 3 does not accept a subject of the state:
*Сняться в плохом фильме — мне всё равно
что плюнуть в вечность ‘To star in a bad
movie is equivalent to me to spitting into
eternity’.
December 21, 2009. Lectures 13-14
150
Syntactic Idiom всё равно 3
As a matter of fact, всё равно 3 has no subject
valency at all. In the utterance
Сняться в плохом фильме для меня всё равно
что плюнуть в вечность ‘For me, to star in
a bad movie is equivalent to spitting into
eternity’
the expression for me describes the subject of the
situation evaluation and not the subject of
equivalence.
December 21, 2009. Lectures 13-14
151
Syntactic Idioms всё равно
The fact of polysemy of any syntactic
idiom entail additional difficulties in
NLP where the system must not only
discern the syntactic idiom from free
phrases but also distinguish between
the senses within a vocable.
December 21, 2009. Lectures 13-14
152
Syntactic Idioms всё равно



Мне всё равно лететь ‘I have to fly all the
same’
Мне всё равно, лететь или не лететь. ‘It is
all the same to me whether I have to fly or not’
Мне всё равно, чёрт возьми, чистить
картошку или мыть туалет! ‘To hell with it,
it is all the same to me whether I should peel
the potatoes or scrub the toilet’ vs. To hell with
it, I have to peel the potatoes or scrub the toilet
all the same’
December 21, 2009. Lectures 13-14
153
Syntactic Idioms всё равно
 In these cases, a helpful method of
ambiguity resolution is interactive
man-machine sense disambiguation.
December 21, 2009. Lectures 13-14
154
Russian Syntax of Quantification
 Several syntactic relations:
 quantitative
 quantitative-auxiliar
 approximative-quantitative
 approximative-ordinal
December 21, 2009. Lectures 13-14
155
Approximative-Ordinal Syntactic
Relation
(1) Он приедет числа двадцатого ‘he will come
approximately on the twentieth’
(2) “Вчерашний день, часу в шестом, Зашел я на
Сенную”. ‘Yesterday, at about six o’clock, I entered the
Hay Square’ (Nikolay Nekrasov)
(3) *Машина остановилась цикле на первом. ‘The
machine stopped at about the first cycle’
(4) Она вернулась только часу в первом. ‘She returned at
between twelve and one’
December 21, 2009. Lectures 13-14
156
Quantitative Syntactic Relation
(1а) Книга называется "Три товарища".
(1б) Книга называется «Двадцать три товарища".
(2а) Он увидел трех товарищей.
(2б) *Он увидел двадцать трех товарищей.
(2в) Он увидел двадцать три товарища.
(3) Он поговорил с тремя товарищами.
(4) Он знал одного лингвиста.
(5) Он знал двадцать одного лингвиста.
(6а) Имеется десять красок.
(6б) Имеется примерно <приблизительно> десять красок.
(6в) Имеется десять различных акварельных красок.
(6г) Имеется примерно <приблизительно> десять различных
акварельных красок.
December 21, 2009. Lectures 13-14
157
Approximative-Quantitative Syntactic Relation
(1) Мы провели там часа два. ‘We spent there about
two hours’
(2) Можно уйти часа в два. ‘We may go at about two
o’clock’
(3) Он заработает тысяч пять с половиной. ‘He will
earn about five and a half thousand’
(4) Он заработает тысяч пять с половиной рублей.
‘He will earn about five and a half thousand roubles’
December 21, 2009. Lectures 13-14
158
Quantiative and ApproximativeQuantitative Syntactic Relation
(1a) Книга называется "Три товарища".
(1b) *Книга называется "Товарища три".
(2а) Имеется десять красок.
(2b) Имеется примерно <приблизительно> десять
красок.
(2c) ?Имеется красок десять.
(3a) Имеется десять различных акварельных красок.
(3b) Имеется примерно <приблизительно> десять
различных акварельных красок.
(3c) *Имеется различных акварельных красок десять.
December 21, 2009. Lectures 13-14
159
Quantiative and ApproximativeQuantitative Syntactic Relation
(4a) Я потратил двадцать два рубля.
(4b) Я потратил примерно двадцать два рубля.
(4c) Я потратил рубля двадцать два.
(5a) Я потратил двадцать один рубль.
(5b) Я потратил примерно двадцать один рубль.
(5c) ??Я потратил рубль двадцать один.
December 21, 2009. Lectures 13-14
160