Transcript Lecture 3
CS3101 Python
Lecture 3
Agenda
•
•
•
•
•
•
Scoping
Documentation, Coding Practices, Pydoc
Functions
•
•
Named, optional, and arbitrary arguments
Generators and Iterators
Functional programming tools
•
lambda, map, filter, reduce
Regular expressions
Homework 3
Extra credit solution to
HW1
•Dynamic programming: 15+- lines
–determining whether a bill of N dollars is satisfiable
–
–
–
–
resolves to whether you can satisfy a bill of N – J dollars
where J is an item on your menu
Create an empty list (knapsack) with N+1 entries
Base case: we know we can satisfy a bill of 0 dollars
For each item on your menu
For index = 0 to N + 1
• If knapsack[index] is empty, and knapsack[index – item’s cost] is
not:
• We now know how to satisfy this bill, so append the current item
to a solution list which lives at knapsack[index]
Homework 3,
Exercise 1
•Requirements:
– 1. Write a program using
–
–
–
regular expressions retrieve
the current weather from a
website of your choosing.
Just the temperature is OK.
2. Use that information to
suggest a sport to play.
./sport.py
It’s 36 degrees today. You
should ski!
http://www.nytimes.com/weather
Homework 3,
Exercise 2
•Requirements:
– a) Write a program which
–
–
–
uses regular expressions
and URLLIB to print the
address of each image on
Columbia’s homepage
(www.columbia.edu)
b) Use regular expressions
to print the title of each of
the news stories on the
main page
./news.py
./images.py
Scoping
•
•
•
Local Namespaces / Local scope
•
A functions parameters and variables that are
bound within the function
Module scope
•
Variables outside functions at the module level are
global
Hiding
•
Inner scope wins: if a name conflict occurs between
a local variable and a global one, the local one
takes precedence
Global statement
•
•
Local scope wins by default
If within a function you must refer to a global
variable of the same name, redeclare it first with the
global keyword
•
•
•
‘global identifiers’, where identifiers contains one
or more IDs separated by commas
Never use global if your function just accesses a
global variable, only if it rebinds it
Global in general is poor style, it breaks
encapsulation, but you’ll see it out there
Closures and Nested
Scope
• Using a def statement
with another
functions body defines a nester or inner
function
•
•
•
The parent function is referred to as a the
outer
Nested functions may access outer
functions parameters - but not rebind
them
This trick can be used to form closures as
we’ll see in lambdas
Closures
•
•
•
This example adopted from Python in a
Nutshell
def make_adder(augend):
•
•
def add(addend):
•
return addend+augent
return add
Calling make_adder(7) returns a function that
accepts a single argument and adds seven to
it
Namespace resolution
•
•
•
•
•
•
Name resolutions proceeds as follows
Local scope (i.e., this function)
Outer scope (i.e., enclosing functions)
Module level scope (i.e., global variables)
Built in scope (i.e., predefined python keywords)
A word to the wise - do not name your variables when
there is a danger of conflicting with modules your may
import
•
E.g., ‘open = 5’ is dangerous if you’re using file
objects, later use of the open method might not
resolve where you expect!
Documentation and
Pydoc
def complex(real=0.0, imag=0.0):
"""Form a complex number. Keyword
arguments: real -- the real part (default
0.0) imag -- the imaginary part (default
0.0) """ if imag == 0.0 and real == 0.0:
return complex_zero ...
•
•
String literal beginning method, class, or module:
•
One sentence concise summary, followed by a blank, followed by
detail.
References
•
http://www.python.org/dev/peps/pep-0257/
•
•
•
•
•
Code is read MANY
more times than it is
written
Trust me, it’s worth
it
First line should be a concise and descriptive
statement of purpose
Self documentation is good, but do not repeat
the method name! (e.g., def setToolTip(text)
#sets the tool tip)
Next paragraph should describe the method and
any side effects
Then arguments
Python’s thoughts on
documentation
•
•
A Foolish Consistency is the Hobgoblin of
Little Minds
http://www.python.org/dev/peps/pep-0008/
Functions, returning
multiple values
•
•
•
•
Functions can return multiple values (of
arbitrary type), just separate them by
commas
Always reminded me of MATLAB
def foo():
•
return [1,2,3], 4, (5,6)
myList, myInt, myTuple = foo()
A word on mutable
arguments
•
•
Be cautious when passing mutable data
structures (lists, dictionaries) to methods especially if they’re sourced from modules
that are not your own
When in doubt, either copy or cast them as
tuples
Semantics of argument
passing
•
•
•
•
•
Recall that while functions can not rebind arguments, they can
alter mutable types
Positional arguments
Named arguments
Special forms *(sequence) and ** (dictionary)
Sequence:
•
•
•
•
zero or more positional followed by
zero or more named
zero or 1 *
zero or 1 **
Positional arguments
•
•
•
•
•
•
def myFunction(arg1, arg2, arg3, arg4, arg5, arg6):
•
.....
Potential for typos
Readability declines
Maintenance a headache
Frequent headache in Java / C (I’m sure we can all
recall some monster functions written by colleagues
/ fellow students)
We can do better
Named arguments
•
•
•
Syntax identifier = expression
Named arguments specified in the function declaration
optional arguments, the expression is their default value
if not provided by the calling function
Two forms
•
•
•
1) you may name arguments passed to functions even
if they are listed positionally
2) you may name arguments within a functions
declaration to supply default values
Outstanding for self documentation!
Named argument
example
•
•
def add(a, b):
•
return a + b
Equivilent calls:
•
•
•
print add(4,2)
print add(a=4, b=2)
print add(b=2, a=4)
Default argument
example
•
•
•
•
def add(a=4, b=2):
•
return a+b
print add(b=4)
print add(a=2, b=4)
print add(4, b=2)
Sequence arguments
•
•
•
•
Sequence treats additional arguments as iterable positional arguments
def sum(*args):
#equivilent to return sum(args)
•
•
•
sum = 0
for arg in args:
•
sum += arg
return sum
Valid calls:
•
•
•
sum(4,2,1,3)
sum(1)
sum(1,23,4,423,234)
Sequences of named
arguments
•
•
**dct must be a
dictionary whose
keys are all
strings, values of
course are
arbitrary
each items key is
the parameter
name, the value is
the argument
# **
# collects keyword
# arguments into a dictionary
def foo(**args): print args
foo(homer=‘donut’,\
lisa = ‘tofu’)
{'homer': 'donut', 'lisa': 'tofu'}
Optional arguments are everywhere
# three ways to call the range function
# up to
range(5)
[0, 1, 2, 3, 4]
# from, up to
range(-5, 5)
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
# from, up to , step
range(-5, 5, 2)
[-5, -3, -1, 1, 3]
Arbitrary arguments example
•
•
•
•
•
We can envision a max function pretty easily
# idea: max2(1,5,3,1)
# >>> 5
# idea: max2(‘a’, ‘b’, ‘c’, ‘d’, ‘e’)
# >>> e
• def max2(*args):
•
for arg in args…
Arbitrary arguments example
• def max1(*args):
•
best = args[0]
•
for arg in args[1:]:
•
if arg > best:
•
best = arg
•
return best
• def max2(*args):
•
return sorted(args)[0]
Argument matching rules
• General rule: more complicated to the right
• For both calling and definitional code:
• All positional arguments must appear first
– Followed by all keyword arguments
• Followed by the * form
– And finally **
Functions as arguments
• Of course we can pass functions are arguments as
well
def myCompare(x, y):
–…
sorted([5, 3, 1, 9], cmp=myCompare)
Lambdas
• The closer you can get to mathematics the
more elegant your programs become
• In addition to the def statement, Python
provides an expression which in-lines a
function – similar to LISP
• Instead of assigning a name to a function,
lambda just returns the function itself –
anonymous functions
When should you use Lambda
• Lambda is designed for handling simple functions
– Conciseness: lambdas can live places def’s cannot
(inside a list literal, or a function call itself)
– Elegance
• Limitations
– Not as general as a def – limited to a single expression,
there is only so much you can squeeze in without using
blocks of code
• Use def for larger tasks
• Do not sacrifice readability
– More important that your work is a) correct and b)
efficient w.r.t. to people hours
Quick examples
• Arguments work just like functions – including
defaults, *, and **
• The lambda expression returns a function, so you
can assign a name to it if you wish
foo = (lambda a, b=“simpson”: a + “ “ + b)
foo(“lisa”)
lisa simpson
foo(“bart”)
bart simpson
More examples
• # Embedding lambdas in a list
• myList = [(lambda x: x**2), (lambda x: x**3)]
• for func in myList:
– print func(2)
• 4
• 8
•
•
•
•
•
•
# Embedding lambdas in a dictionary
donuts = {'homer' : (lambda x: x * 4), 'lisa' : (lambda x: x * 0)}
Donuts[‘homer’](2)
8
Donuts[‘lisa’](2)
0
Multiple arguments
• (lambda x, y: x + " likes " + y)('homer',
'donuts')
• 'homer likes donuts‘
State
• def remember(x):
– return (lambda y: x + y)
•
•
•
•
•
foo = remember(5)
print foo
<function <lambda> at 0x01514970>
foo(2)
7
Maps
• One of the most
common tasks with lists
is to apply an operation
to each element in the
sequence
# w/o maps
donuts = [1,2,3,4]
myDonuts = []
for d in donuts:
myDonuts.append(d * 2)
print myDonuts
[2, 4, 6, 8]
# w maps
def more(d): return d * 2
myDonuts = map(more, donuts)
print myDonuts
[2, 4, 6, 8]
Map using Lambdas
donuts = [1,2,3,4]
•
•
•
•
def more(d): return d * 3
myDonuts = map(more, donuts)
print myDonuts
[3, 6, 9, 12]
myDonuts = map((lambda d: d * 3), donuts)
print myDonuts
[3, 6, 9, 12]
More maps
• # map is smart
• # understands functions requiring multiple
arguments
• # operates over sequences in parallel
• pow(2, 3)
• 8
• map(pow, [2, 4, 6], [1, 2, 3])
• [2, 16, 216]
• map((lambda x,y: x + " likes " + y),\
Functional programming tools:
Filter and reduce
• Theme of functional programming
– apply functions to sequences
• Relatives of map:
– filter and reduce
• Filter:
– filters out items relative to a test function
• Reduce:
– Applies functions to pairs of items and running
results
Filter
• range(-5, 5)
• [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
• def isEven(x): return x % 2 == 0
• filter ((isEven, range(-5,5))
• [-4, -2, 0, 2, 4]
• filter((lambda x: x % 2 == 0), range(-5, 5))
• [-4, -2, 0, 2, 4]
Reduce
• A bit more
complicated
• By default the first
argument is used to
initialize the tally
• def reduce(fn, seq):
– tally = seq[0]
– For next in seq:
• tally = fn(tally, next)
– return tally
• FYI More functional
tools available
reduce((lambda x, y: x + y), \
[1,2,3,4])
10
import operator
reduce(operator.add, [1, 2, 3])
6
List comprehensions revisited:
combining filter and map
•
•
•
•
# Say we wanted to collect the squares of the even numbers below 11
# Using a list comprehension
[x ** 2 for x in range(11) if x % 2 == 0]
[0, 4, 16, 36, 64, 100]
•
•
•
#Using map and filter
map((lambda x: x ** 2), filter((lambda x: x % 2 == 0), range(11)))
[0, 4, 16, 36, 64, 100]
•
•
•
# Easier way, this uses the optional stepping argument in range
[x ** 2 for x in range(0,11,2)]
[0, 4, 16, 36, 64, 100]
Reading files with list comprehensions
•
•
•
•
# old way
lines = open(‘simpsons.csv’).readlines()
[‘homer,donut\n’, ‘lisa,brocolli\n’]
for line in lines:
– line = line.strip()…
• # with a comprehension
• [line.strip() for line in open(‘simpsons.csv’).readlines()]
• [‘homer,donut’, ‘lisa,brocolli’]
• # with a lambda
• map((lambda line: \
• line.strip(), open(‘simpsons.csv’).readlines())
Generators and Iterators
• Generators are like normal functions in most
respects but they automatically implement the
iteration protocol to return a sequence of values
over time
• Consider a generator when
– you need to compute a series of values lazily
• Generators
– Save you the work of saving state
– Automatically save theirs when yield is called
• Easy
– Just use “yield” instead of “return”
Quick example
• def genSquares(n):
•
for i in range(N):
•
yield i ** 2
• print gen
• <generator object at 0x01524BE8>
• for i in genSquares(5):
– print i, “then”,
• 0 then 1 then 4 then 9 then 16 then
Error handling preview
•
•
•
•
•
def gen():
i=0
while i < 5:
i+=1
yield i ** 2
•
•
•
•
•
•
•
•
•
•
x = gen()
x.next()
>> > 1
x.next()
>>> 4
…
Traceback (most recent call last):
File "<pyshell#110>", line 1, in <module>
x.next()
StopIteration
try:
x.next()
except StopIteration:
print "done”
5 Minute Exercise
• Begin writing a
generator produce
primes
• Start with 0
• When you find a prime,
yield (return) that value
• Write code to call your
generator
def genPrimes():
…. yield prime
def main()
g = genPrimes()
while True:
print g.next()
Regular Expressions
• A regular expression (re) is a string that
represents a pattern.
• Idea is to check any string with the pattern to
see if it matches, and if so – where
• REs may be compiled or used on the fly
• You may use REs to match, search, substitute,
or split strings
• Very powerful – a bit of a bear syntactically
Quick examples: Match vs. Search
•
•
•
•
import re
p = re.compile('[a-z]+')
m = pattern.match('donut')
print m.group(), m.start(),
m.end()
• donut 0 5
• m = pattern.search('12 donuts
are
• \
better than 1')
• print m.group(), m.span()
• donuts (3, 9)
m = pattern.match(‘ \
12 donuts are better \
than 1')
if m:
print m.group()
else:
print "no match“
no match
Quick examples: Multiple hits
•
•
•
•
import re
p = re.compile('\d+\sdonuts')
print p.findall('homer has 4 donuts, bart has 2 donuts')
['4 donuts', '2 donuts']
•
•
•
•
•
import re
p = re.compile('\d+\sdonuts')
iterator = p.finditer('99 donuts on the shelf, 98 donuts on the shelf...')
for match in iterator:
print match.group(), match.span()
•
•
99 donuts (0, 9)
98 donuts (24, 33)
Re Patterns 1
Pattern
Matches
.
Matches any character
^
Matches the start of the string
$
Matches the end of the string
*
Matches zero or more cases of the previous RE (greedy
– match as many as possible)
+
Matches one or more cases of the previous RE (greedy)
?
Matches zero or one case of the previous RE
*?, +?
Non greedy versions (match as few as possible)
.
Matches any character
Re Patterns 2
Pattern
Matches
\d, \D
Matches one digit [0-9] or non-digit [^0-9]
\s, \S
Matches whitespace [\t\n\r\f\v] or non-whitespace
\w, \W
Matches one alphanumeric char – (understands Unicode
and various locales if set)
\b, \B
Matches an empty string, but only at the start or end of a
word
\Z
Matches an empty string at the end of a whole string
\\
Matches on backslash
{m,n}
Matches m to n cases of the previous RE
[…]
Matches any one of a set of characters
|
Matches either the preceding or following expression
(…)
Matches the RE within the parenthesis
Gotchas
• RE punctuation is backwards
– “.” matches any character when unescaped, or
an actual “.” when in the form “\.”
– “+” and “*” carry regular expression meaning
unless escaped
Quick examples * vs. +. \b
.* vs .+
• The pattern
\b
• The pattern
– ‘Homer.*Simpson’ will match:
– r’\bHomer\b’ will find a hit
searching
– Homer
– Homer Simpson
• HomerSimpson
• Homer Simpson
• Homer Jay Simpson
• The pattern
• The pattern
– ‘Homer.+Simpson’ will match:
– r’\bHomer’ will find a hit
searching
– HomerJaySimpson
• Homer Simpson
• Homer Jay Simpson
Sets of chars: []
• Sets of characters are
denoted by listing the
characters within brackets
• [abc] will match one of a, b,
or c
• Ranges are supported
• [0-9] will match one digit
• You may include special sets
within brackets
– Such as \s for a whitespace
character or \d for a digit
p = re.compile('[HJ]')
iterator=p.finditer(“\
HomerJaySimpson")
for match in iterator:
print match.group(), \
match.span()
H (0, 1)
J (5, 6)
Alternatives: |s
• A vertical bar
matches a pattern on
either side
import re
p = re.compile(‘Homer|Simpson')
iterator=p.finditer(“HomerJaySimpson")
for match in iterator:
print match.group(), match.span()
Homer (0, 5)
aco (8, 12)
RE Substitution
•
•
•
•
•
•
•
import re
line = 'Hello World!'
r = re.compile('world', re.IGNORECASE)
m = r.search(line)
>>> World
print r.sub('Mars!!', line, 1)
>>> Hello Mars!!!
RE Splitting
•
•
•
•
•
•
import re
line = 'lots 42 of random 12 digits 77'
r = re.compile('\d+')
l = r.split(line)
print l
>>> ['lots ', ' of random ', ' digits ', '']
Groups: ()
• Frequently you need to obtain more information than just
whether the RE matched or not.
• Regular expressions are often used to dissect strings by writing
a RE divided into several subgroups which match different
components of interest.
p = re.compile('(homer\s(jay))\ssimpson')
m = p.match('homer jay simpson')
print m.group(0)
print m.group(1)
print m.group(2)
homer jay simpson
homer jay
jay
Putting it all together
(and optional flags)
•
•
•
•
import re
r = re.compile('simpson', re.IGNORECASE)
print r.search("HomerJaySimpson").group()
simpson
• r = re.compile('([[A-Z][a-z]+).*?(\d+$)', re.MULTILINE)
•
•
•
iterator = r.finditer('Homer is 42\nMaggie is 6\nBart is 12')
for match in iterator:
print match.group(1), "was born", match.group(2), "years ago“
•
•
•
Homer was born 42 years ago
Maggie was born 6 years ago
Bart was born 12 years ago
Discussion: Who can explain how this RE works?
Finding tags within HTML
•
•
•
•
•
•
•
import re
line = '<tag>my eyes! the goggles do \
nothing.</tag>'
r = re.compile('<tag>(.*)</tag>')
m = r.search(line)
print m.group(1)
>>> my eyes! the goggles do nothing.
5 Minute Exercise
• Download the
Columbia homepage
to disk
• Open it with python
• Use regular
expressions to being
extracting the news
import re
line = '<tag>my eyes! the
goggles do \
nothing.</tag>'
r=
re.compile('<tag>(.*)</tag
>')
m = r.search(line)
print m.group(1)
>>> my eyes! the goggles
do nothing.