LING3820-03-Strings2

Download Report

Transcript LING3820-03-Strings2

COMPUTATION WITH
STRINGS 2
DAY 2 - 8/29/14
LING 3820 & 6820
Natural Language Processing
Harry Howard
Tulane University
Course organization
2






http://www.tulane.edu/~howard/LING3820/
The syllabus is under construction.
http://www.tulane.edu/~howard/CompCultEN/
Is there anyone here that wasn't here on
Wednesday?
I didn't put together any practice, because we have
done too little.
I will e-mail you some practice to do over the
weekend.
NLP, Prof. Howard, Tulane University
29-Aug-2014
Computer hygiene
3


You must turn your
computer off every
now and then, so that
it can clean itself.
By the same token, you
should close
applications every now
and then.
NLP, Prof. Howard, Tulane University
29-Aug-2014
4
Review
What is a string?
What is an escape character?
What do these do: +, *, len(), sorted(), set()?
What is the difference between a type & a token?
Does Python know what you mean?
NLP, Prof. Howard, Tulane University
29-Aug-2014
5
§3. Computation with strings
A string is a sequence of characters delimited
between single or double quotes.
NLP, Prof. Howard, Tulane University
29-Aug-2014
6
Open Spyder
NLP, Prof. Howard, Tulane University
29-Aug-2014
Method notation
7
The material aggregated to a method in parentheses is called its
argument(s).
 In the examples above, the argument S can be thought of linguistically as
the object of a noun: the length of S, the alphabetical sorting of S, the set
of S. But what if two pieces of information are needed for a method to
work, for instance, to count the number of o’s in otolaryngologist?
 To do so, Python allows for information to be prefixed to a method with a
dot:
>>> S.count('o')
 The example can be read as “in S, count the o’s”, with the argument being
the substring to be counted, 'o', and the attribute being the string over
which the count progresses, or more generally:
 attribute.method(argument)
 What can be attribute and argument varies from method to method and so
must be memorized.

NLP, Prof. Howard, Tulane University
29-Aug-2014
How to clean up a string
8
There is a group of methods for modifying the properties of a string, illustrated
below. You can guess what they do from their names:
>>> S = 'i lOvE yOu'
>>> S
>>> S.lower()
>>> S.upper()
>>> S.swapcase()
>>> S.capitalize()
>>> S.title()
>>> S.replace('O','o')
>>> S.strip('i')
>>> S2 = ' '+S+' '
>>> S2
>>> S2.strip()

NLP, Prof. Howard, Tulane University
29-Aug-2014
9
3.3. How to find your way around a string
NLP, Prof. Howard, Tulane University
29-Aug-2014
index() or rindex()
10

1.
2.
3.
4.
5.
6.
7.
8.
9.
You can ask Python for a character’s index with the index() or
rindex() methods, which take the string as an attribute and the
character as an argument:
>>> S = 'otolaryngologist'
>>> S.index('o')
>>> S.rindex('o')
>>> S.index('t')
>>> S.rindex('t')
>>> S.index('l')
>>> S.rindex('l')
>>> S.index('a')
>>> S.rindex('a')
NLP, Prof. Howard, Tulane University
29-Aug-2014
find() & rfind()
11

1.
2.
3.
4.
5.
6.
7.
8.
Python also has a method find(), which appears to do
the same thing as index():
>>> S.find('o')
>>> S.rfind('o')
>>> S.find('t')
>>> S.rfind('t')
>>> S.find('l')
>>> S.rfind('l')
>>> S.find('a')
>>> S.rfind('a')
NLP, Prof. Howard, Tulane University
29-Aug-2014
index() or find()
12

1.
2.
3.
4.
5.
6.
Where they differ lies in how they handle null
responses:
>>> S.find('z')
-1
>>> S.index('z')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
NLP, Prof. Howard, Tulane University
29-Aug-2014
How to find substrings
13

1.
2.
3.
4.
5.
6.
These two methods can also find substrings:
>>> S.find('oto')
>>> S.index('oto')
>>> S.find('ist')
>>> S.index('ist')
>>> S.find('ly')
>>> S.index('ly')
NLP, Prof. Howard, Tulane University
29-Aug-2014
Limiting the search to a substring
14

1.
2.
3.
4.

index() and find() allow optional arguments for the
beginning and end positions of a substring, in order
to limit searching to a substring’s confines:
>>> S.index('oto', 0, 3)
>>> S.index('oto', 3)
>>> S.find('oto', 0, 3)
>>> S.find('oto', 3)
index/find(string, beginning, end)
NLP, Prof. Howard, Tulane University
29-Aug-2014
15
3.3.2. Zero-based indexation
NLP, Prof. Howard, Tulane University
29-Aug-2014
0=1
16


You probably thought that the first character in a
string should be given the number 1, but Python
actually gives it 0, and the second character gets 1.
There are some advantages to this format which do
not concern us here, but we will mention a realworld example.
 In
Europe, the floors of buildings are numbered in such
a way that the ground floor is considered the zeroth
one, so that the first floor up from the ground is the first
floor, though in the USA, it would called the second
floor.
NLP, Prof. Howard, Tulane University
29-Aug-2014
In a picture
17
NLP, Prof. Howard, Tulane University
29-Aug-2014
Finding characters given a position
18
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
>>> S = 'abcdefgh'
>>> S[2]
>>> S[5]
>>> S[2:5]
>>> S[-6]
>>> S[-3]
>>> S[-6:-3]
>>> S[-6:-3] == S[2:5]
>>> S[-6:5]
>>> S[5:-6]
NLP, Prof. Howard, Tulane University
29-Aug-2014
More slicing
19

1.
2.
3.
4.
5.

1.
2.
3.
4.
If no beginning or end position is mentioned for a slice, Python
defaults to the beginning or end of the string:
>>> S[2:]
>>> S[-2:]
>>> S[:2]
>>> S[:-2]
>>> S[:]
The result of a slice is a string object, so it can be concatenated with
another string or repeated:
>>> S[:-1] + '!'
>>> S[:2] + S[2:]
>>> S[:2] + S[2:] == S
>>> S[-2:] * 2
NLP, Prof. Howard, Tulane University
29-Aug-2014
Extended slicing
20

1.
2.
3.
4.
Slice syntax allows a mysterious third argument, by
appending an additional colon and integer. What
do these do?:
>>> S[::1]
>>> S[::2]
>>> S[::3]
>>> S[::4]
NLP, Prof. Howard, Tulane University
29-Aug-2014
All three arguments together
21

1.
2.
3.
4.

Of course, you can still use the first two arguments
to slice out a substring, which the third one steps
through:
>>> S[1:7:1]
>>> S[1:7:2]
>>> S[1:7:3]
>>> S[1:7:6]
Thus the overall format of a slice is:
 string[start:end:step]
NLP, Prof. Howard, Tulane University
29-Aug-2014
How to reverse a string
22
1.
2.
3.
4.
>>> S[::-1]
>>> S[::-2]
>>> S[::-3]
>>> S[::-4]
NLP, Prof. Howard, Tulane University
29-Aug-2014
23
Next time
The rest of §3
I will send you some practice for what we
have done this week.
NLP, Prof. Howard, Tulane University
29-Aug-2014