Python Intro
Download
Report
Transcript Python Intro
I256:
Applied Natural Language Processing
Marti Hearst
Aug 30, 2006
1
Today
Introductions
Python Basics
2
Introduction to NLTK
The Natural Language Toolkit (NLTK) provides:
Basic classes for representing data relevant to
natural language processing.
Standard interfaces for performing tasks, such as
tokenization, tagging, and parsing.
Standard implementations of each task, which can
be combined to solve complex problems.
Pre-parsed corpora and tools to access them.
Slide by Diane Litman
3
NLTK: Example Modules
nltk_lite.tokenize: processing individual
elements of text, such as words or sentences.
nltk_lite.probability: modeling frequency
distributions and probabilistic systems.
nltk_lite.tag: tagging tokens with supplemental
information, such as parts of speech or wordnet
sense tags.
nltk_lite.parser: high-level interface for parsing
texts.
Slide by Diane Litman
4
Python and Natural Language Processing
Python is a great language for NLP:
Simple (and fun!)
Powerful string manipulation
Easy to debug:
– Interpreted language
Easy to test small steps incrementally
– Exceptions
Easy to structure
– Modules
– Object oriented programming
Slide by Diane Litman
5
An Interpreted Language
The interpreter processes what you’ve typed as soon as you hit
<return>:
>>> 3 * 4
12
>>>
Python is sensitive to leading whitespace
If you put in extra spaces, or too few, it will complain.
If you type a multi-line command, you must do the
indenting; the interpreter helps you with this:
>>> if 4 > 3:
print "duh”
duh
>>>
6
Some Python Basics
Strings
7
Some Python Basics
Lists
8
Some Python Basics
Iteration over Lists
9
Modules and Packages
Python modules “package program code and
data for reuse.” (Lutz)
Similar to library in C, package in Java.
Python packages are hierarchical modules (i.e.,
modules that contain other modules).
Three commands for accessing modules:
1.
2.
3.
import
from…import
reload
Slide by Diane Litman
10
Modules and Packages: import
The import command loads a module:
# Load the regular expression module
>>> import re
To access the contents of a module, use dotted
names:
# Use the search method from the re module
>>> re.search(‘\w+’, str)
To list the contents of a module, use dir:
>>> dir(re)
[‘DOTALL’, ‘I’, ‘IGNORECASE’,…]
Slide by Diane Litman
11
Modules and Packages
from…import
The from…import command loads individual
functions and objects from a module:
# Load the search function from the re module
>>> from re import search
Once an individual function or object is loaded with
from…import, it can be used directly:
# Use the search method from the re module
>>> search (‘\w+’, str)
Slide by Diane Litman
12
Import vs. from…import
Import
Keeps module functions
separate from user
functions.
Requires the use of dotted
names.
Works with reload.
Slide by Diane Litman
from…import
Puts module functions and
user functions together.
More convenient names.
Does not work with
reload.
13
Modules and Packages: reload
If you edit a module, you must use the reload
command before the changes become visible in
Python:
>>> import mymodule
...
>>> reload (mymodule)
The reload command only affects modules that have
been loaded with import; it does not update
individual functions and objects loaded with
from...import.
Slide by Diane Litman
14
Configuring the Python IDE
Called IDLE
You can set key bindings
Go to Options > Configure IDLE
Select Keys tab
Select an action and specify an alternative binding
Click Save as New Custom Key Set
– Give it a name
Click Apply so it takes hold
If you want to use an existing binding (say, Control-A)
–
–
–
–
First find the command that has that binding
Change it to something else
Click Apply
Now choose your command and change it’s binding ot Control-A
15
For Next Week
Monday: holiday, no class
Sign up for the email list!
Mail to: [email protected]
Put in msg body: subscribe anlp
For Wed Sept 6
Finish the programming tutorial
Do the regular expression tutorial.
We’ll go through regex’s some in class.
16