Linguisitics
Download
Report
Transcript Linguisitics
Linguisitics
Levels of description
Speech and language
• Language as communication
• Speech vs. text
– Speech primary
– Text is derived
– Text is not “written speech”
– Speech is not (usually) spoken text
– Obviously they are related
Levels of description
• Smallest linguistic “unit” is the phoneme
(speech) or (by analogy) the grapheme (text)
• Phonemes combine to form words, or more
exactly, morphemes
• Morpheme: smallest meaningful unit of language
• Words combine to form sentences (or
utterances) according to the rules of syntax
• Form is related to meaning via semantics
• Pragmatics deals with how language use relates
to the real world
Phonetics
• Study of speech sounds
• Humans are the only species that have
developed language
– No dedicated speech organs as such
• Not all sounds are speech sounds, even
though they do convey meaning
• Speech sounds combine in arbitrary ways
to form words
Phonetics
• Articulatory phonetics concerned with how
speech sounds are produced
• Acoustic phonetics concerned with
physical properties of speech signal
• Auditory phonetics concerned with how
speech sounds are perceived
• All are of course related
Possible speech sounds
•
•
•
•
Range of sounds possible in human languages
Consonants vs vowels
Most consonants are pulmonic egressive
Consonant sound is determined by place and
manner of articulation, plus voicing, and some
other features
• Vowel sound is determined by tongue height and
position (front/back) plus lip shape
(round/spread)
Phonemes
• Huge number of possible distinctions, but not all
are significant in any given language
• Differences that are used to distinguish words
are phonemic
• Phoneme – group of (similar) sounds perceived
by speakers as “the same”
• Other differences between allophones
• Phonemic distinction in one language may be
allophonic in another
• (-etic ~ -emic ~ allo- ~ -ology)
Prosody
• Besides individual speech sounds, other
features of speech can carry meaning:
– Length, volume, pitch
– Intonation (pitch)
• Can be syntactic or lexical (in some languages)
– Stress (combination of all three)
• Lexical or semantic/pragmatic
Writing and text
• Various writing systems worldwide
• Most familiar is alphabetic
– Ideally each letter represents a sound (phoneme)
– Rarely 1:1 mapping
• Phoneme can have different spellings
• Individual letter can be different phoneme
• Some phonemes represented by combination of letters (not
always contiguous)
• Other possibilities: consonantal, syllabic,
ideological, and various combinations
Graphemes
• Latin alphabet has 26 letters
• But English has ~50 phonemes
• Phoneme can have different spellings
– /s/ can be ‘s’, ‘c’, ‘sc’, ‘ss’, …
• Individual letter can be different phoneme
– ‘c’ can be /s/ or /k/
• Some phonemes represented by
combination of letters
– /θ/ ‘th’, /∫/ ‘sh’
Morphology
• Smallest meaningful unit of language is the
morpheme
• Some words are single morphemes (meaning
can’t be broken down), but many words have
constituent parts
• Words usually consist of a root plus affix(es),
though some words can have multiple roots
• Lexeme – abstract notion of group of word forms
that belong together
– lexeme ~ root ~ base form ~ dictionary (citation) form
Role of morphology
• Commonly made distinction: inflectional vs
derivational
• Inflectional morphology is grammatical
– number, tense, case, gender
• Derivational morphology concerns word
building
– part-of-speech derivation
– words with related meaning
Morphological processes
•
•
•
•
•
•
Affixes: prefix, suffix, infix, circumfix
Umlaut, ablaut
Gemination, (partial) reduplication
Root and pattern
Stress (or tone) change
Sandhi
Language typology
• Based on extent to which morphological
processes play a role
• Agglutinative – morphological affixes can
be stacked up almost indefinitely
– Implies that list of “possible words” is infinite
• Synthetic – little or no affixation
• Extent of morphology can interact with
syntax: highly inflected languages often
have freer word order
Morphemes
• Morphemes associated with meaning
• (Like phonemes) not 1:1
• Single morpheme can have various allomorphs
– Allomorphic variation usually conditioned, either
intrinsically, or extrinsically (phonotactics,
morphosyntax)
– Can be “free variation”
• Single form can represent different morphemes
• Often rules of allomorphic variation are
systematic
Inflectional morphology
• Grammatical in nature
• Does not carry meaning, other than grammatical
meaning
• Highly systematic, though there may be
irregularities and exceptions
– Simplifies lexicon, only exceptions need to be listed
– Unknown words may be guessable
• Language-specific and sometimes idiosyncratic
• (Mostly) helpful in parsing
Derivational morphology
• Lexical in nature
• Can carry meaning
• Fairly systematic, and predictable up to a point
– Simplifies description of lexicon: regularly derived
words need not be listed
– Unknown words may be guessable
• But …
– Apparent derivations have specialised meaning
– Some derivations missing
• Languages often have parallel derivations which
may be translatable
Issues for NLP
• Need scheme to handle morphology
• Can involve ambiguity which must be solved in
analysis
• Can contribute to syntactic analysis
– Morphological analysis identifies the lexeme plus
grammatical information associated with inflections
• And vice versa
– Morphological ambiguity may be resolved by syntactic
context
• For many applications it is necessary to deal
with just lexemes rather than word-forms and
grammatical information: stemming
Morphological processing
• Stemming
• String-handling approaches
– Regular expressions
– Mapping onto finite-state automata
• 2-level morphology
– Mapping between surface form and lexical
representation
• Related issues of what is in lexicon