slide - Jeremy Chen`s Website

Download Report

Transcript slide - Jeremy Chen`s Website

Introduction to Python
Session 1: The Basics and a Little More
Jeremy Chen
Objectives
• Learn basic Python
• Learn to use Python for Data Analysis
2
新加坡国立大学商学院
This Session’s Agenda
•
•
•
•
•
Getting Started
Data types
Using Modules and Packages
Functions and Flow Control
Objects
3
新加坡国立大学商学院
Why Use Python
• Writing readable code is easy
– Natural syntax to commands
– Indentation-consciousness forces readability
– “Everything I like about Perl, and everything I
like about MATLAB.” - Someone
• Modules for everything
–
–
–
–
The drudgery (csv, JSON, …)
Image Manipulation and Plotting
Scientific Computing
More: https://wiki.python.org/moin/UsefulModules
新加坡国立大学商学院
4
About Me
• Credibility Destroyer:
– I’m not really a Python user.
– I use/have used Python for:
• Scientific computing (Optimization, Statistics)
• Server maintenance (Database/file system clean
up/data acquisition)
• … may be using Django (a Python Web Framework)
to build an interesting web app… once the design and
architecture is figured out.
• … but I suppose I’ve done my fair bit of
data wrangling and analysis
5
新加坡国立大学商学院
Setting Up
• Vanilla Python: http://www.python.org/getit/
– Windows/Mac: Pick a binary
– Linux: (You should know what to do)
– We will use a 2.7.x build.
Install this now.
• Some existing third-party software is not yet compatible
with Python 3; if you need to use such software, you can
download Python 2.7.x instead.
• I use Python 2.6.6 “on server” and Python 2.7.5 elsewhere.
• Distributions with Almost Everything You Need:
– Enthought Canopy
– Python(x,y)
– WinPython
Start downloading
one of these now
6
新加坡国立大学商学院
Starting Up
• Start Interpreter: IDLE or /usr/bin/python
• Basics:
– CTRL-D or exit() to exit
– Comments begin with #
>>> x = y = z = 1 # Multiple Assignments
>>> x += 1 # This is not C: x++ doesn’t work
>>> x
2
>>> some_list = [1,2.0,"hi"] # Can contain multiple "types"
>>> some_list[1] # Zero-based indexing: Stuff starts at 0
2.0
>>> some_list[1:2] # List of (1+1)-th to 2-nd items: Weird?
[2.0]
新加坡国立大学商学院
7
Basic Data Types
• Strings
>>> some_string = "this is a string"
>>> some_string[5:] # Element 5 to end
'is a string'
>>> some_string[:5] # Element 0 to 5-1
'this '
• Integers
>>> a = 1; b = 2 # Another "multiple assignment"
>>> a/b # "Truncation" is about to happen
0
• Floats
>>> fl = 1.0; b = 2; fl/b # Another multiple assignment and...
0.5
8
新加坡国立大学商学院
Container Data Types
• Lists
>>> some_list = [1,2.0,"hi"] # Can contain multiple "types"
>>> x,y,z = [1,2,3]; y # Assignment
2
>>> some_list.append(5); some_list # Append to end
[1, 2.0, 'hi', 5]
>>> el = some_list.pop(); el; some_list # Extract last element
5
[1, 2.0, 'hi']
>>> el = some_list.pop(1); el; some_list # Get second this time
2.0
[1, 'hi']
>>> some_list[0]=range(5); some_list # Change the first element
[[0, 1, 2, 3, 4], 'hi']
>>> some_list[0][3]="too much"; some_list # Be slightly abusive
[[0, 1, 2, 'too much', 4], 'hi']
>>> del some_list[0][2]; some_list # Delete 2nd element of list in 1st element
[[0, 1, 'too much', 4], 'hi']
新加坡国立大学商学院
9
Collections
• Lists
>>> anotherlist = [1,2,3,4]; anotherlist # Concatenating lists
[1, 2, 3, 4]
>>> anotherlist += range(5,7); anotherlist # Adding to end; range behaves like that
[1, 2, 3, 4, 5, 6]
>>> anotherlist[-3] # Slicing: Get 3rd last element
4
>>> anotherlist[0:4] # Slicing: Get 0th to 3rd element (not 4th)
[1, 2, 3, 4]
>>> anotherlist[:-3] # Slicing: Get elements until before 3rd last element
[1, 2, 3]
>>> anotherlist * 2 # What happens?
[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
>>> anotherlist + 1 # What happens?
Traceback (most recent call last):
File "<pyshell#78>", line 1, in <module>
anotherlist += 1
TypeError: 'int' object is not iterable
10
新加坡国立大学商学院
Collections
• Sets (A collection of unique items)
>>> one_to_three = {1,2,3}; one_to_three
set([1, 2, 3])
>>> {1,2,3,1} # Sets are collections of unique items
set([1, 2, 3])
>>> one_to_ten = set(range(1,11)); one_to_ten # Note the thing abt range
set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> five_to_eleven = set(range(5,12))
>>> len(one_to_three), len(one_to_ten) # Cardinality
(3, 10)
>>> 3 in one_to_three, 3 in five_to_eleven # Membership
(True, False)
>>> one_to_three.issubset(one_to_ten) # Containment
True
>>> one_to_three.union(five_to_eleven) # Union
set([1, 2, 3, 5, 6, 7, 8, 9, 10, 11])
>>> one_to_ten.intersection(five_to_eleven) # Intersection
set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> # There is also issuperset, difference, symmetric_difference
新加坡国立大学商学院
11
Collections
• Dictionaries (An indexed container)
>>> daysOfWeek = {1: "Mon", 2: "Tue", 3: "Wed", 4: "Thu", 5: "Fri"}
>>> daysOfWeek[5] # No more zero-based indexing
'Fri'
>>> daysOfWeek[6]="Sat"; daysOfWeek[7]="Sun" # Adding Entries
>>> daysOfWeek
{1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun'}
>>> daysOfWeekInv = {"Mon":1, "Tue":2, "Wed":3, "Thu":4, "Fri":5}
>>> daysOfWeekInv["Mon"]
1
>>> daysOfWeekInv.keys() # Get list of keys; Your mileage may vary
['Fri', 'Thu', 'Wed', 'Mon', 'Tue']
>>> daysOfWeek[8]="ExtraDay"; daysOfWeek
{1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun',
8: 'ExtraDay'}
>>> del daysOfWeek[8]; daysOfWeek # Delete key-value pair
{1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun'}
新加坡国立大学商学院
12
Collections (Not really…)
• Tuples
>>> tup = (1,2.0,"hi"); tup # Looks just like a list
(1, 2.0, 'hi')
>>> tup[1] = 0 # But tuples are immutable (You can't change them)
Traceback (most recent call last):
File "<pyshell#54>", line 1, in <module>
tup[1] = 0
TypeError: 'tuple' object does not support item assignment
• Tuples v.s. Lists
– Tuples are not constant lists
– Lists are meant to be homogeneous sequences
– Tuples are meant to be heterogeneous data structures
• e.g.: thisCustomer = (<customerId>,<address>,<DOB>,...)
• Lightweight classes
新加坡国立大学商学院
13
Using Modules and Packages
• Modules/Packages are useful collections of code
(functions, classes, constants) that one may use.
>>> import math
>>> math.sqrt(4)
2.0
>>> import math as m
>>> m.e
2.718281828459045
>>> from math import pi as PIE
# Omit at "as PIE" and pi will be "pi"
>>> PIE
3.141592653589793
>>> import os
# This is useful
>>> os.getcwd()
'C:\\Python27\\lib\\site-packages\\xy'
>>> os.chdir(r'C:\Python27') # Raw Strings don't need to be escaped. (And can’t end in "\".)
>>> os.getcwd()
'C:\\Python27'
• Difference: Modules are single .py files (with
stuff) while packages come in directories.
新加坡国立大学商学院
14
Comprehensions
• List comprehensions
>>> squares = [x**2 for x in range(10)]; squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> import math; [math.sqrt(x) for x in squares]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
>>> import random; [random.randint(0,9) for x in range(10)]
[5, 8, 3, 6, 1, 1, 6, 6, 0, 2]
• Set comprehensions
>>> {x for x in range(10) if x >= 5}
set([8, 9, 5, 6, 7])
>>> set(x for x in range(10) if x >= 5) # Sad pandas using Python 2.6.x
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
• Dictionary comprehensions
>>>
{0:
>>>
{0:
{x:x**2 for x in range(5)}
0, 1: 1, 2: 4, 3: 9, 4: 16}
dict((x,x**2) for x in range(5)) # Sad pandas using Python 2.6.x
0, 1: 1, 2: 4, 3: 9, 4: 16}
新加坡国立大学商学院
15
Optional Exercise
• Compute daysOfWeekInv from daysOfWeek. (See
slide introducing dictionaries.)
• Form a set of all the weekend and Tuesday dates
from 1 Dec 2012 to 1 Mar 2013. Less January
dates.
–
–
–
import datetime; startdate = datetime.date(2012,12,1)
one_day = datetime.timedelta(days=1)
(startdate + 2*one_day).isoweekday() # Should be 1 (Monday); 7 for Sunday
• Form a list of multiples of 3 above 30 but below
100 in descending order.
• Do the same for a list of multiples of x above L
but below H.
16
新加坡国立大学商学院
Flow Control
• The if statement (Note the indents)
>>> x = int(raw_input("Please enter an integer: "))
Please enter an integer: 8
>>> if x > 10:
# This block executes if the condition is True
print "x > 10"
elif x == 8:
# Optional case block
print "x == 8"
else:
# Optional catch-all block
print "x <= 10 and x is not 8"
x == 8
17
新加坡国立大学商学院
Flow Control
• The for loop “loops” over an “iterator”
>>> for n in range(2, 10, 2):
print n,
2 4 6 8
• The break statement and else block
>>> for n in range(15, 22):
for c in range(2, n):
if n % c == 0: # remainder from division (commonly known as mod)
print "%d is not prime; " % n,
break
else:
# Evaluates if for loop doesn’t break
print "%d is prime" % n # Note the different print statements
15 is not prime; 16 is not prime;
18 is not prime; 19 is prime
20 is not prime; 21 is not prime;
新加坡国立大学商学院
17 is prime
18
Flow Control
• The continue statement
>>> for n in range(15, 22):
if n % 3 == 0: # remainder from division (commonly known as mod)
continue
print "%d is not a multiple of 3" % n
16
17
19
20
is
is
is
is
not
not
not
not
a
a
a
a
multiple
multiple
multiple
multiple
of
of
of
of
3
3
3
3
19
新加坡国立大学商学院
Functions
• An example:
>>> def fn1(x): return x * x
>>> def fn2(x,y):
z = x + y
return z
>>> fn2(1,fn1(2))
5
• Function declarations:
– Start with def…
– … are followed by a function name
– … then arguments in parentheses
• Output is passed back with return
• Indentation defines the function body
新加坡国立大学商学院
20
Functions
• Default arguments and named arguments
>>> def fn(x,y=1):
z = x + 2*y
return z
>>> fn(1) # Default argument used
3
>>> fn(1,5)
11
>>> fn(y=5,x=1) # Named arguments used
11
>>> fn = lambda x,y,z : x+y+z # Lambda Expressions
>>> fn(1,2,3)
6
• Warning: If a default argument is a mutable
object like a list, changing it results in a
different default argument in the next call.
新加坡国立大学商学院
21
Generator Functions
• Generator functions create iterators
>>> def gen(start=1,max=10,step=1):
x = start;
while (x <= max):
yield x;
x += step
>>> print list(gen(2,10,2))
[2, 4, 6, 8, 10]
>>> y = 0
>>> for k in gen(1,10,4):
y += 1
print (y, k)
(1, 1)
(2, 5)
(3, 9)
• yield returns an item and computation
continues if another item is requested.
新加坡国立大学商学院
22
Classes
• Like “mutable tuples with behavior” (or not)
• Contain data that transform in well-defined ways
class SimpleFactorizer:
# Edit in IDLE; Enter as xxx.py, then Run (F5)
def __init__(self):
# Constructor
self.__last_integer = 2
# Initialization of data
self.__primes = [2]
# Initialization of data
# __x variables are Python standard practice
# (“culture”) for labeling “private” data
def prime_list(self):
return list(self.__primes)
# duplicate list
def compute_primes_to(self, u):
for c in range(self.__last_integer+1, u+1):
if self.get_prime_factor(c) == 1:
self.__primes.append(c)
self.__last_integer = u
# Continued on next slide
新加坡国立大学商学院
23
Objects
# ... continued from last slide
def get_prime_factor(self, v):
factor = 1
for c in self.__primes:
if v % c == 0:
factor = c
break
return factor
def get_prime_factors(self, v):
factors = []
remainder = int(v) # Cast to integer
if remainder > self.__last_integer:
self.compute_primes_to(remainder)
while remainder > 1:
thisFactor = self.get_prime_factor(remainder)
factors += [thisFactor]
remainder /= thisFactor
return factors
# Continued on next slide...
新加坡国立大学商学院
24
Objects
# ... continued from last slide
# Test it out
df = SimpleFactorizer()
print df.get_prime_factors(2*2*3*5*7)
print df.get_prime_factors(2*2*3*5*7*13)
print df.prime_list()[:min(50, len(df.prime_list()))]
# Print first 50
# Actually [:50] works
# even if list length < 50
• Output:
[2, 2, 3, 5, 7]
[2, 2, 3, 5, 7, 13]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163,
167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229]
• This can be done in the interpreter too.
25
新加坡国立大学商学院
Some Things I Missed Out
• Better String formatting. All you need is:
>>> "Position 0: {0}; Position 1: {1}; Position 0 again: {0}".format('a', 1, 7)
'Position 0: a; Position 1: 1; Position 0 again: a'
>>> r"C:\BlahBlah\output_p{param}_s{num_samples}".format(param=2, num_samples=10000)
'C:\\BlahBlah\\output_p2_s10000'
>>> "% Affected (Q={param:.3}): {outcome:.1%}".format(param=1.234567, outcome=0.23454)
'% Affected (Q=1.23): 23.5%'
• Inheritance, Polymorphism
– Standard Object Oriented Programming
• Handling “unplanned events” with exceptions
– “It is easier to ask for forgiveness than permission.”
• Testing
– (This is not a software engineering course.)
– For more info: doctest, unittest
新加坡国立大学商学院
26
☺
27
新加坡国立大学商学院