LING 681 Intro to Comp Ling

Download Report

Transcript LING 681 Intro to Comp Ling

COMPUTATION WITH
STRINGS 1
DAY 2 - 8/27/14
LING 3820 & 6820
Natural Language Processing
Harry Howard
Tulane University
Course organization
2




http://www.tulane.edu/~howard/LING3820/
The syllabus is coming.
http://www.tulane.edu/~howard/CompCultEN/
Is there anyone here that wasn't here on Monday?
NLP, Prof. Howard, Tulane University
27-Aug-2014
3
Installation of Python
Can anyoone NOT get Spyder to do this?
NLP, Prof. Howard, Tulane University
27-Aug-2014
Test
4
>>> 237 + 9075
9312
 Be sure to try the other arithmetic operators, subtraction (-),
multiplication (*), and division (/). Does division work the
way you expect?
 After you have tired of playing with math, play with some
text:
>>> word = 'msinairatnemhsilbatsesiditna'
>>> 'anti' in word
False
>>> 'itna' in word
True
NLP, Prof. Howard, Tulane University
27-Aug-2014
5
§3. Computation with strings
A string is a sequence of characters delimited
between single or double quotes.
NLP, Prof. Howard, Tulane University
27-Aug-2014
Examples
6
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
>>> monty = 'Monty Python'
>>> monty
'Monty Python'
>>> doublemonty = "Monty Python"
>>> doublemonty
'Monty Python'
>>> circus = 'Monty Python's Flying Circus'
File "<stdin>", line 1 circus = 'Monty Python's Flying Circus'
^ SyntaxError: invalid syntax
>>> circus = "Monty Python's Flying Circus"
>>> circus
"Monty Python's Flying Circus"
>>> circus = 'Monty Python\'s Flying Circus'
>>> circus
"Monty Python's Flying Circus"
NLP, Prof. Howard, Tulane University
27-Aug-2014
The + and * operators
7

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
A new string can be formed by combination or concatenation of two
strings with + or repeating a string a number of times with *.
Unfortunately, a character cannot be deleted with –:
>>> S = 'balloon'
>>> S+'!'
>>> S+!
>>> 'M'+S
>>> S*2
>>> S+'!'*2
>>> (S+'!')*2
>>> S-'n'
>>> S+2
>>> S+'2'
NLP, Prof. Howard, Tulane University
27-Aug-2014
Some string methods
8

1.
2.
3.
4.
5.
6.
7.
8.
Python supplies several methods that can be applied to
strings to perform tasks. Some of them are illustrated below.
The input code is given, without the corresponding output. It
is up to you to type them in to see what they do:
>>> len(S)
>>> len(S+'!')
>>> len(S*2)
>>> sorted(S)
>>> len(sorted(S))
>>> set(S)
>>> sorted(set(S))
>>> len(set(S))
NLP, Prof. Howard, Tulane University
27-Aug-2014
Tokens vs. types
9





set(S) produces the set of characters in the string.
One useful property of sets is that they do not contain
duplicate elements.
The process of removing repetitions performed by set()
touches on a fundamental concept in language computation,
that of the distinction between a token and a type.
A representation in which repetitions are allowed is said to
consist of tokens, while one in which there are no repetitions
is said to consist of types.
Thus set() converts the tokens of a string into types. There is
one type of 'o' in 'balloon', but two tokens of 'o'.
NLP, Prof. Howard, Tulane University
27-Aug-2014
Method notation
10
The material aggregated to a method in parentheses is called its
argument(s).
 In the examples above, the argument S can be thought of linguistically as
the object of a noun: the length of S, the alphabetical sorting of S, the set
of S. But what if two pieces of information are needed for a method to
work, for instance, to count the number of o’s in otolaryngologist?
 To do so, Python allows for information to be prefixed to a method with a
dot:
>>> S.count('o')
 The example can be read as “in S, count the o’s”, with the argument being
the substring to be counted, 'o', and the attribute being the string over
which the count progresses, or more generally:
 attribute.method(argument)
 What can be attribute and argument varies from method to method and so
has to be memorized.

NLP, Prof. Howard, Tulane University
27-Aug-2014
Cleaning up a string
11
There is a group of methods for modifying the properties of a string, illustrated
below. You can guess what they do from their names:
>>> S = 'i lOvE yOu'
>>> S
>>> S.lower()
>>> S.upper()
>>> S.swapcase()
>>> S.capitalize()
>>> S.title()
>>> S.replace('O','o')
>>> S.strip('i')
>>> S2 = ' '+S+' '
>>> S2
>>> S2.strip()

NLP, Prof. Howard, Tulane University
27-Aug-2014
Next time
3.3. Finding your way around a string
I will try to send you some practice for what
we have done today.