Transcript pptx - EECS

Short Python tutorial
Shibamouli Lahiri
About Python
1991
February Python first versions
(0.9.0 and 0.9.1) are released
by Guido van Rossum.
2000
October 16 Python 2.0 is
released with Unicode support
and garbage collection.
2008
December 3 Python 3.0 is
released.
Now
Python 2.7.9, Python 3.4.2
stable releases, but mutually
incompatible.
CAEN offers:
Python is a Scripting Language.
Van Rossum designed Python at
CWI, Netherlands, as a
successor to the ABC
Programming Language, and to
interface with Amoeba
Operating System.
Python is an interpreted language.
That means there is no explicitly
separate compilation step.
Rather, the processor reads a
Python program line by line,
converts it into an internal
form, and executes it
immediately.
The name “Python” is inspired by
Monty Python’s Flying Circus.
Python 2.6.6
Slide 1
Variables
A variable is a name of a place where some information is stored. For
example:
>>> yearOfBirth = 1976
>>> currentYear = 2015
>>> age = currentYear - yearOfBirth
>>> print age
The variables in this example program are all integers. Python supports the following
variable types (among others):
•
•
•
•
•
•
Numbers: int, long, float, complex
Booleans: True, False
Strings: “Michigan”, ‘Wolverines’
Lists: [‘this’, ‘is’, ‘a’, ‘list’]
Tuples: (‘this’, ‘is’, ‘a’, ‘tuple’)
Dictionaries: {“Happy” : 1, “New” : 2, “Year” : 3}
In python, variables have implicit types rather than explicit types. That means, you
can assign variables of any type to any other type.
Slide 2
Operations on numbers
Python contains the following arithmetic operators:
• +: sum
• -: subtraction
• *: product
• /: division
• %: modulo division
• **: exponent
• //: floor division
Apart from these operators, Python contains some built-in arithmetic
functions. Some of these are mentioned in the following list:
•
•
•
•
abs(x): absolute value
int(x): integer part of a number; cast a string to an int
float(x): cast a number/string to a float
round(x): round the floating point number
Slide 3
Input and output
>>> # age calculator
>>> yearOfBirth = raw_input ("Please enter your birth year : ")
>>> yearOfBirth = int(yearOfBirth)
>>> print "Your age is ", 2015 - yearOfBirth
>>> # count the number of lines in a file
>>> INPUTFILE = open(myfile, “r”)
>>> count = 0
>>> line = INPUTFILE.readline()
>>> while line:
>>> count += 1
>>> line = INPUTFILE.readline()
>>> INPUTFILE.close()
>>> print count, “lines in file”, myfile
>>> # open for writing
>>> OUTPUTFILE = open(myfile, “w”)
Slide 4
Conditional structures
>>> # determine whether number is odd or even
>>> number = raw_input ("Enter number: ")
>>> number = int (number)
>>> if (number-2*int(number/2) == 0):
…
print number, "is even"
… elif (abs(number-2*int(number/2)) == 1):
…
print number, "is odd"
… else:
…
print "Something strange has happened!“
…
>>>
Note the indentation! Python is extremely picky about indentation.
I usually put four spaces as indentation – throughout my program. Don’t use tabs.
And a comment: DO write comments in your Python programs!
Slide 5
Numeric test operators
An overview of the numeric test operators:
• ==: equal
• !=: not equal
• <>: not equal
• <: less than
• <=: less than or equal to
• >: greater than
• >=: greater than or equal to
All these operators can be used for comparing two variables in an if condition.
Truth expressions
three logical operators:
• and
• or
• not
Slide 6
Iterative structures
#print numbers 1-10 in three different ways
i=1
while (i <= 10):
print i
i += 1
for i in (1,2,3,4,5,6,7,8,9,10):
print i
for i in range (1, 11):
print i
Stop a loop, or force continuation:
break
continue
Exercise:
Read ten numbers and print the largest, the smallest and a count representing how
many of them are divisible by three.
largest = None
if (largest is None or number > largest): largest = number
if (number-3*int(number/3) == 0): count3 += 1
Slide 7
Basic string operations
- strings are stored in the same type of variables we used for storing numbers
- string values can be specified between double and single quotes
Comparison operators for strings are the same as the ones for numbers.
Strings also support the following operations:
-
+: concatenation
+=: concatenation and assignment
*: repetition
[]: slicing
[:]: range slicing
Examples:
'Michigan ' + 'Wolverines' = 'Michigan Wolverines'
'over and ' * 7 = 'over and over and over and over and over and over and over
and '
'Michigan'[4] = 'i'
'Michigan'[0:4] = 'Mich'
Slide 8
Basic string operations (Contd.)
• STRING.upper() and STRING.lower() uppercases and lowercases a
string, respectively.
string = "Michigan"
string.upper() # yields "MICHIGAN"
string.lower() # yields "michigan“
• STRING.strip() removes leading and trailing whitespace characters from a
string.
string = " imitation game "
string.strip() # yields "imitation game"
• STRING.count(s) counts how many times a substring s appears in the
string.
string= "anna"
string.count("a") # yields 2
string.count("nn") # yields 1
• len(STRING) returns the length of a string.
mylen = len("EECS 498") # yields 8
Slide 9
String substitution and string
matching
The replace function modifies sequences of characters (substitute)
The in operator checks for matching (match)
- ‘Mich’ in ‘Michigan’ is True.
- ‘Michigan’.replace(‘igan’, ‘’) yields ‘Mich’.
By default replace() replaces all occurrences of the search pattern.
- It accepts a third argument to specify how many occurrences to replace.
- ‘I like IR’.replace(‘I’, ‘U’) yields ‘U like UR’.
- ‘I like IR’.replace(‘I’, ‘U’, 1) yields ‘U like IR’.
Immutability. Strings are immutable.
For example:
mystring = “Michigan”
mystring[3] = “n” # doesn’t work
Slide 10
Examples
# replace first occurrence of "bug"
text = text.replace("bug", feature, 1)
# replace all occurrences of "bug"
text = text.replace("bug", feature)
# convert to lower case
text = text.lower()
# delete vowels
text = ‘’.join([c for c in text if c.lower() not in ‘aeiou’])
# replace nonnumber sequences with x
text = ‘’.join([c if c in ‘0123456789’ else ‘x’ for c in text])
# replace all capital characters by CAPS
text = ‘’.join([‘CAPS’ if c.isupper() else c for c in text])
Simple example:
Print all lines from a file that include a given sequence of characters
[emulate grep behavior]
Slide 11
Regular expressions aka regex
(re module)
Examples:
1. Clean an HTML formatted text
•
•
•
•
•
•
•
•
•
•
•
•
•
•
\b: word boundaries
\d: digits
\n: newline
\r: carriage return
\s: white space characters
\t: tab
\w: alphanumeric characters
^: beginning of string
$: end of string (breaks at newlines!)
.: any character
[bdkp]: characters b, d, k and p
[a-f]: characters a to f
[^a-f]: all characters except a to f
abc|def: string abc or string def
•
•
•
•
•
•
*: zero or more times
+: one or more times
?: zero or one time
{p,q}: at least p times and at most q times
{p,}: at least p times
{p}: exactly p times
2. Grab URLs from a Web page
using re.findall()
3. Substitute patterns in a file
with other patterns using re.sub()
• *?: zero or more times (lazy evaluation)
• +?: one or more times (lazy evaluation)
Slide 12
Split function re.split()
string = "Jan Piet\nMarie \tDirk"
mylist = re.split (‘\s+’, string) # yields [ "Jan","Piet","Marie","Dirk" ]
string = " Jan Piet\nMarie \tDirk\n" # watch out, empty string at the begin and end!!!
mylist = re.split (‘\s+’, string) # yields [ "", "Jan","Piet","Marie","Dirk", "" ]
string = "Jan:Piet;Marie---Dirk" # use any regular expression...
mylist = re.split (‘[:;]|---’, string) # yields [ "Jan","Piet","Marie","Dirk" ]
string = "Jan Piet" # use list() cast to split on letters
letters = list(string) # yields [ "J","a","n"," ","P","i","e","t" ]
Example:
1. Tokenize a text: separate simple punctuation (, . ; ! ? ( ) )
2. Add all the digits in a number
Slide 13
Lists and tuples
Lists are mutable. Tuples are immutable.
Following examples are valid for both lists [] and tuples ():
a = [] # empty list
b = [1,2,3] # three numbers
c = ["Jan","Piet","Marie"] # three strings
d = ["Dirk",1.92,46,"20-03-1977"] # a mixed list
Variables and sublists in a list
a=1
b = [a,a+1,a+2] # variable interpolation
c = ["Jan",["Piet","Marie"]] # sublist
d = ["Dirk",1.92,46,[],"20-03-1977"] # empty sublist in a list
e = b + c # same as [1,2,3,"Jan",["Piet","Marie"]]
==============================================================
Practical construction function for lists only:
• range(x, y)
x = range(1, 7) # same as [1, 2, 3, 4, 5, 6]
y = range(1.2, 5.2) # doesn’t work on floating points
z = range(2, 6) + [8] + range(11, 14) # same as [2,3,4,5,8,11,12,13]
w = range(1, 10, 2) # same as [1, 3, 5, 7, 9]
Slide 14
More about lists
array = ["an","bert","cindy","dirk"]
length = len(array) # length now has the value 4
array = ["an","bert","cindy","dirk"]
length = len(array)
print length # prints 4
print array[-1] # prints "dirk"
print len(array) # prints 4
(a, b) = ("one","two")
(array[0], array[1]) = (array[1], array[0]) # swap the first two
Pay attention to the fact that assignment is by reference, NOT by value.
array = ["an","bert","cindy","dirk"]
copyarray = array # creates a new reference “copyarray”
copyarray[2] = "ZZZ" # changes BOTH array as well as copyarray
Extract unique elements of a list:
array = ["an","bert","cindy","dirk","an","bert","cindy","dirk"]
array = list(set(array)) # yields ["an","bert","cindy","dirk"]
Slide 15
Manipulating lists and their elements
• List1 += List2
appends List2 to the end of List1.
array = ["an","bert","cindy","dirk"]
brray = ["evelien","frank"]
array += brray # array is ["an","bert","cindy","dirk","evelien","frank"]
brray += ["gerben"] # brray is ["evelien","frank","gerben"]
• LIST.pop() removes the last item of its argument list and returns it. if the
list is empty it gives an error.
array = ["an","bert","cindy","dirk"]
item = array.pop() # item is "dirk" and list is [ "an","bert","cindy" ]
• LIST.append() appends a single element to the end of a list.
array = ["an","bert","cindy"]
array.append("dirk") # yields ["an","bert","cindy","dirk"]
• LIST.extend() extends a list by inserting elements from a second list.
array = ["an","bert","cindy","dirk"]
brray = ["evelien","frank"]
array.extend(brray) # array is ["an","bert","cindy","dirk","evelien","frank"]
Slide 16
Working with lists
Convert lists to strings
array = ["an","bert","cindy","dirk"]
print "The array contains", array[0], array[1], array[2], array[3]
Check if an element is in a list
array = ["an","bert","cindy","dirk"]
“bert” in array # returns True (in operator)
function STRING.join(LIST).
string = ":".join(array) # string now has the value "an:bert:cindy:dirk"
string = "+" + "+".join(array) # string now has the value "+an+bert+cindy+dirk"
Iteration over lists
for i in range(0, len(array)):
item = array[i]
item = item.upper()
print item,
for item in array:
item = item.upper()
print item, # prints a capitalized version of each item
Slide 17
List enumeration
array = ["an","bert","cindy","dirk"]
for item in array:
item = item.upper()
print item, # prints a capitalized version of each item
What if we wanted the index of each item, along with its value?
Solution: Use enumerate(). It returns an iterator object (wrapper) on top of the list.
array = ["an","bert","cindy","dirk"]
for item_index, item in enumerate(array):
item = item.upper()
print item_index, item # prints the index, and the item (uppercased)
You can start the index from any integer you want (default is zero).
array = ["an","bert","cindy","dirk"]
for item_index, item in enumerate(array, 1): # now the index starts from one
item = item.upper()
print item_index, item
Slide 18
Filter, map, list comprehension
• filter (lambda x: CONDITION, LIST )
returns a list of all items from list that satisfy some condition.
For example:
large = filter(lambda x: x > 10, [1,2,4,8,16,25]) # returns [16,25]
i_names = filter(lambda x: 'i' in x, array) # returns ["cindy","dirk"]
Example:
Print all lines from a file that include a given sequence of characters
[emulate grep behavior]
• map (OPERATION, LIST) and LIST COMPREHENSIONS
perform an arbitrary operation on each element of a list.
For example:
rounded = map(round, [1.67, 2.99, 0.83, 26.75]) # returns [2.0, 3.0, 1.0, 27.0]
strings = map(str, [8, 7, 6, 2, 0]) # yields ['8', '7', '6', '2', '0']
more = [x + 3 for x in [1,2,4,8,16,25]] # returns [4,5,7,11,19,28]
initials = [x[0] for x in array] # returns ["a","b","c","d"]
even = [x if x % 2 == 0 for x in [1,2,3,4]] # returns [2, 4]
odd = [x if x % 2 == 1 for x in [1,2,3,4]] # returns [1, 3]
tmp = ["even" if x %2 == 0 else "odd" for x in [1,2,3,4]] # yields ["odd", "even",
"odd", "even"]
Slide 19
any and all
• any(CONDITION(x) for x in LIST)
if any element of the list satisfies the condition, it returns True.
For example:
any (x > 0 for x in [-5, -2, 0, 1]) # returns True
any (x > 0 for x in [-15, -12, -20, -31]) # returns False
• all(CONDITION(x) for x in LIST)
if all elements of the list satisfy the condition, it returns True.
For example:
all (x < 0 for x in [-5, -2, 0, 1]) # returns False
all (x < 0 for x in [-15, -12, -20, -31]) # returns True
Slide 20
Built-in list functions
• sum (LIST) and LIST.count()
returns the sum of a list of numbers, and how many times an element
appears in a list.
For example:
mysum = sum([1,2,4,8,16,25]) # returns 56
mycount = [1,2,4,8,4,0,2].count(4) # returns 2
• max (LIST) and min(LIST)
returns maximum and minimum of a list of numbers.
For example:
mymax = max([1.67, 2.99, 0.83, 26.75]) # returns 26.75
mymin = min([8, 7, 6, 2, 0]) # yields 0
• LIST.reverse() and LIST.sort()
reverses a list, and sorts a list – respectively.
For example:
mylist = [1.67, 2.99, 0.83, 26.75]
mylist.reverse() # mylist is now [26.75, 0.83, 2.99, 1.67]
mylist.sort()
# mylist is now [0.83, 1.67, 2.99, 26.75]
Slide 21
Dictionaries (Associative Arrays)
- Associates keys with values
- Allows for almost instantaneous lookup of a value that is associated
with some particular key
- If the value for a key does not exist in the dictionary, the access to it
returns KeyError.
- The in operator returns true if a key exists in the dictionary.
- if(key in dictionary) returns False if the key doesn’t exist, and/or has no
associated value.
Slide 22
Dictionaries (cont’d)
Examples
wordfrequency = {} # initializes an empty dictionary
wordfrequency["the"] = 12731 # creates key "the", value 12731
phonenumber["An De Wilde"] = "+31-20-6777871"
index[word] = nwords
occurrences[a] = occurrences.get(a, 0) + 1 # if this is the first reference,
# the value associated with a will
# be increased from 0 to 1
birthdays = {"An":"25-02-1975", "Bert":"12-10-1953", "Cindy":"23-05-1969",
"Dirk":"01-04-1961"} # fill the dictionary
mylist = birthdays.items() # make a list of the key/value pairs
copy_of_bdays = birthdays # copies a dictionary by reference, so be careful !!
Slide 23
Operations on Dictionaries
- DICTIONARY.keys() returns a list with only the keys in the dictionary.
- DICTIONARY.values() returns a list with only the values in the
dictionary, in the same order as the keys returned by keys().
- DICTIONARY.items() returns a list of key-value pairs in the dictionary,
in the same order as the keys returned by keys().
- dict([(key1,val1), (key2,val2), (key3,val3), …]) returns a dictionary
constructed from a list of key-value pairs.
- Note that the keys/values are returned in random order, so if you want to
preserve the original order, use a list.
sortedlist = []
for key in sorted(dictionary.keys()):
sortedlist.append ((key , dictionary[key]))
print "Key", key, "has value", dictionary[key]
reverse the direction of the mapping, i.e. construct a dictionary with keys and
values swapped:
backwards = dict(zip(forward.values(), forward.keys()))
(if forward has two identical values associated with different keys, those will end up
Slide 24
as only a single element in backwards)
Multidimensional data structures
- Python does not really have multi-dimensional data structures, but a
nice way of emulating them, using references
matrix[i][j] = x
lexicon1["word"][1] = partofspeech
lexicon2["word"]["noun"] = frequency
List of lists/tuples
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # an array of references to anonymous arrays
matrix = [(1, 2, 3), (4, 5, 6), (7, 8, 9)] # list of tuples
Zipping
list_of_strings = ["an","bert","cindy","dirk"]
list_of_numbers = [1, 2, 3, 4]
newlist = zip(list_of_numbers , list_of_strings)
# yields [(1, 'an'), (2, 'bert'), (3, 'cindy'), (4, 'dirk')] – a list of tuples.
newdict = dict(zip(list_of_numbers , list_of_strings))
# yields a dictionary! {1: 'an', 2: 'bert', 3: 'cindy', 4: 'dirk'}
Slide 25
Multidimensional structures
Dictionary of lists
lexicon1 = {
# a dictionary from strings to anonymous arrays
"the" : [ "Det", 12731 ],
"man" : [ "Noun", 658 ],
"with" : [ "Prep", 3482 ]
}
Dictionary of dictionaries
lexicon2 = { # a dictionary from strings to anonymous dictionaries of strings to
numbers
"the" : { "Det" : 12731 },
"man" : { "Noun" : 658 , "Verb" : 12 },
"with" : { "Prep" : 3482 }
}
Slide 26
Programming Example
A program that reads lines of text, gives a unique index number to each word and
counts the word frequencies
# read all lines in the input
import re
nwords = 0
index = {}
frequency = {}
line = raw_input()
while line:
# cut off leading and trailing whitespace
line = line.strip()
if line == "exit": break
# and put the words in an array
words = re.split ('\s+', line)
if len(words) == 0:
# there are no words?
line = raw_input()
continue
# process each word...
for word in words:
# if it's unknown assign a new index
if word not in index:
index[word] = nwords
nwords += 1
# always update the frequency
frequency[word] = frequency.get(word, 0) + 1
line = raw_input()
# now we print the words sorted
for word in sorted(index.keys()):
print word, "has frequency", frequency[word], "and index", index[word]
Slide 27
A note on sorting
If we would like to have the words sorted by their frequency instead of by
alphabet, we need a construct that imposes a different sort order.
sorted() function can use any sort order that is provided as an expression.
- the usual alphabetical/numerical sort order:
sorted(list)
- for a reverse sort, set reverse argument to True:
sorted(list, reverse = True)
- sort the keys of a dictionary by their value instead of by their own identity:
sorted(dictionary, key = dictionary.get)
Slide 28
Basics about Functions
def askForInput():
print "Please enter something: ",
# function call
askForInput()
Tip: put related functions in a file (with the extention .py) and include the file
with the command import:
# we will use this file
from myFile import *
# you can import other modules too.
import os
import glob
import re
etc.
# imports can be combined in one line.
import glob, os, re, string, sys
Slide 29
Variables Scope
A variable a is used both in the function and in the main part of the program.
a=0
print a
def changeA():
a=1
print a
changeA()
print a
The value of a is printed three times. Can you guess what values are printed?
- a is a local variable.
Slide 30
Variables Scope
Global variables must be declared explicitly.
a=0
print a
def changeA():
global a ## explicit declaration
a=1
print a
changeA()
print a
What values are printed now?
Slide 31
Communication between functions
and programs
- Functions may use any data types as input arguments, in any order.
- Functions may return any data type (including lists, tuples, and dictionaries)
# the return statement from a function
return [1,2]
# or simply (1,2)
# read the return values from the function
[a,b] = function()
- Read the main program arguments using sys.argv (similar to C++)
Example: Function to compute the average of a list of numbers.
def computeAverage(listOfNumbers):
if len(listOfNumbers) == 0:
return 0
mysum = 0
for element in listOfNumbers:
mysum += element
return mysum * 1.0/ len(listOfNumbers)
# why do I have 1.0?
# could you make the function shorter?
Slide 32
More about file management
•
•
•
•
•
•
INFILE = open("myfile"): reading
INFILE = open("myfile", "r"): reading
OUTFILE = open("myfile", "w"): writing
OUTFILE = open("myfile", "a"): appending
os.listdir("mydirectory"): list contents of a directory
glob.glob("mydirectory/*"): list contents of a directory (add a “/*” at the end)
Operations on an open file handle
• a = INFILE.readline(): read a line from INFILE into a
• a = INFILE.readlines(): read all lines from INFILE into a as a list
• a = INFILE.read(length): read length characters from INFILE into a
• a = INFILE.read(): read the whole content of INFILE into a as a single long
string
• OUTFILE.write("text") : write some text in OUTFILE
• print >> OUTFILE, "text": another way to write text
• OUTFILE.writelines([line1, "\n", line2, "\n", line3, "\n", …]): writing a list of
lines
Close files
• FILE.close(): close a file
Tip: When a file is very large (hundreds of MBs or more), use readline() rather
Slide 33
than read() or readlines().
Other file management commands
• os.remove("myfile"): delete file myfile
• os.rename("file1","file2"): change name of file file1 to file2
• os.makedirs("mydir"): create directory mydir
• os.rmdir("mydir"): delete directory mydir if empty
• os.chdir("mydir"): change the current directory to mydir
• os.system("command"): execute command command
• os.path.exists("mydir"): check if directory mydir exists
• os.path.exists("myfile"): check if file myfile exists
• shutil.rmtree("mydir"): delete directory mydir, along with everything inside
Slide 34