in self - University of Northern Colorado

Download Report

Transcript in self - University of Northern Colorado

Object-Oriented Programming in Python
Goldwasser and Letscher
Chapter 12
More Python Containers
Terry Scott
University of Northern Colorado
2007 Prentice Hall
Introduction: What is Covered in
Chapter 12
•
•
•
•
•
•
•
Lists and Tuples.
Dictionaries.
Containers of Containers.
Sets.
Arrays.
Python’s internal use of dictionaries.
Case Study: a simple search engine.
2
Aspects of Containers
•
•
•
•
order: ordered sequence (list, tuple, array).
mutability: list is mutable. tuple is immutable.
associativity: dict (dictionary)
heterogeneity: most Python containers allow
different types. Some containers in other languages
and one that will be introduced in this chapter only
allow one type for the individual components. This is
called homogeneous
• Storage – Python uses referential containers
meaning that rather than the elements in the actual
container, there are references (addresses) to the
actual items.
3
Summary of Aspects of Python
Containers
list tuple dict set
Ordered
X
Mutable
X
X
Compact storage
X
X
Associative
Heterogeneous
frozenset Array
X
X
X
X
X
X
X
X
X
4
lists and tuples
• Use indexes that are sequential.
• Can add or subtract values to make index values
start at 0 as these containers require (example
months 1 – 12 : subtract 1).
• Limitations:
– Lack of permanency: employee id’s – what happens
when an employee leaves.
– Using Social Security Numbers for employees. SSN's
are not sequential.
– Maybe the index values are not numeric.
5
Dictionaries
• Can use non-numeric index values.
– director['Star Wars']  'George Lucas'
– director['The Godfather']  'Francis Ford Coppola'
– director['American Graffiti']  'George Lucas'
• Index values are called keys.
• Keys must be immutable (int, str, tuple)
• Dictionaries point to items such as 'George
Lucas' and are called values.
• No limits on values.
6
Python’s Dictionary Class
director = { } # can also use director = dict()
director['Star Wars'] = 'George Lucas'
director['The Godfather']='Francis Ford Coppola'
director[‘American Graffiti'] = 'George Lucas'
director['Princess Bride'] = 'Rob Reiner'
#can also do the following
director ={'Star Wars': 'George Lucas',
'The Godfather': 'Francis Ford Coppola‘,
'American Graffiti' : 'George Lucas' ,
'Princess Bride' : 'Rob Reiner' }
7
Dictionary Behaviors
Syntax
d[k]
Semantics
Returns value at key k, error if k not found.
d[k] = value
Assigns value to d at key value k.
k in d
True if k is a member of d else False.
len(d)
Returns the number of items in d.
d.clear()
Make d empty.
d.pop(k)
Remove key k and return value at k to
caller.
8
Dictionary Behaviors
Syntax
Semantics
d.popitem()
Removes and returns arbitary key value
pair.
d.keys( )
Returns a list of all keys in d.
d.values( )
Returns a list of all values in d.
d.Items()
Returns a list of tuples.
for k in d
Iterate over all keys short for: for k in
d.keys()
9
Iterating Through Entries of a
Dictionary
#display a dictionary in sorted order on keys
titles = director.keys()
titles.sort()
for movie in titles:
print movie, 'was direct by', director[movie]
#can streamline this syntax
for movie in sorted(director):
print movie, 'was directed by', director[movie]
#can iterate over both keys and values
for movie,person in director.items():
print movie, 'was directed by', person
10
Containers of Containers
• list, tuple and dict can have values of any type
including containers.
• Modeling a two dimensional table (a list of
lists):
game = [['X','X','O'], ['O','O','X'], ['X','O','X ']]
• bottomLeft = game[2][0]
X
X
O
O
O
X
X
O
X
11
Tic-tac-toe Representation
12
Modeling Many-to-Many
Relationships
• Dictionary is a many-to-one relationship.
Many keys may map to the same value.
• Modeling many-to-many relationship. For
example a single movie may have many
actors.
cast['The Princess Bride '] = ('Cary Elwes',
'Robin Wright Penn', 'Chris Sarandon',
'Mandy Patinkin', 'Andre the Giant',. . .)
13
Modeling Many-to-Many
Relationships (continued)
>>> #Tuple used since immutable and cast for
>>> #movies is also.
>>> 'Andre the Giant' in cast
False
#must refer to specific key value in cast
>>> 'Andre the Giant' in cast.values()
False
#must refer to results not keys
>>> 'Andre the Giant' in cast['The Princess Bride']
True
14
Reverse Dictionary
• What if we want to know the keys that are
associated with an item?
• Can do this one at a time or build a
complete reverse dictionary.
original = {'A':1, 'B':3, 'C':3, 'D':4, 'E': 1, 'F': 3}
reverse = {1: ['A', 'E'], 3: ['C', 'B', 'F'], 4: ['D'] }
15
Reverse Dictionary Code
#can build a reverse dictionary
def buildReverse(dictionary):
reverse = { }
for key,value in dictionary.items():
if value in reverse:
reverse[value].append(key)
else:
reverse[value] = [key]
return reverse
16
Sets and Frozensets
• Originally programmers created a mathematical
set class using a list or dictionary.
• Python now contains a set class that is mutable
and a frozenset class that is immutable.
• Elements within the set are in arbitrary order.
• Elements added to set must be immutable.
• Elements of the set only occur once.
• set constructor can be:
– myset = set( )
– mysetb = set(container) #container can be any
#immutable container
17
Set Accessor Methods: Standard
Container Methods
Syntax
Semantics
len(s)
Returns the cardinality of set s.
v in s
Returns True if v is in set s, else
False.
Obviously does opposite of command
above.
Iterates over all values in set s in
arbitrary order.
v not in s
for v in s
18
Set Accessor Methods: Comparing
Two Sets
Syntax
Semantics
s == t
Returns True if s and t have identical
elements, else False.
Returns True if s is a proper subset of t,
else False.
Returns True if set s is a subset of t, else
False.
s<t
s <= t
s.issubset(t)
s >= t
Returns True if t is a proper subset of s,
else False.
s >= t
Returns True if set t is a subset of s, else
s.issuperset(t) False.
19
Set Accessor Methods: Creating a
Third Set on Existing Sets
Syntax
Semantics
s|t
s.union(t)
s&t
s.Intersection(t)
Returns a new set of elements
in both sets but with no repeats.
s–t
s.difference(t)
Returns a new set of elements
that are in set s but not in set t.
Returns a new set of elements
that are in both sets s and t.
s^t
Returns a new set of elements
s.symmetric_difference(t) that are in either set s or t but
not both. s XOR t.
20
Set Mutator Methods
Syntax
Semantics
s.add(v)
Adds value v to set s. No effect if v is already in s.
s.discard(v) Removes v from set s. No effect if v is not in s.
s.remove(v) Removes v from set s if present. Else it raises
KeyError.
s.pop()
Removes and returns arbitrary value from set s.
s.clear()
Removes all entries from the set s.
21
Set Mutator Methods (continued)
Syntax
Semantics
s |= t, s.update(t)
Alters set s making it the
union of sets s and t.
s &= t, s.intersection_update(t)
Alters set s by making it
the intersection of sets s
and t.
Alters set s making it s – t.
s -= t
s.difference_update(t)
s ^= t
Alters set s making it the
s.symmetric_difference_update(t) XOR of s and t.
22
Set Operation Examples
>>> set ([3, 2]) < set([1,2,3])
True
>>> set ([3, 2]) < set([2,3])
False
>>> set ([7]) < set([1,2,3])
False
>>> set ([3, 2]) <= set([2,3])
True
23
Frozensets
• A frozenset is immutable.
• A frozenset can perform all accessor methods
previously listed for a set but none of the mutator
methods.
• All elements of a set or frozenset must be
immutable. Therefore a set can not consist of a
set of sets but it could be a set of frozensets.
• Dictionary keys recall can only be immutable
therefore a frozenset can be a key of a
dictionary.
24
Sets and Frozensets
• Two sets operated on results in a set.
• Two frozensets operated on results in a
frozenset.
• A set and a frozenset results in the type
that occurs first in the operation.
25
Illustrating Set and Frozenset
Operations
>>> colors = set(['red', 'green', 'blue'])
>>> stoplight = frozenset(['green', 'yellow', 'red'])
>>> print colors & stoplight
set(['green', 'red'])
>>> print stoplight & colors
frozenset(['green', 'red'])
26
Lists Versus Arrays
• A list consists of a list of references to
items.
• Advantage is that a list can consist of
different types. This is called a
heterogeneous type.
• Disadvantage is slightly slower access
since it requires two accesses to retrieve
an item (once to get the reference and
once to retrieve the value from the
reference).
27
Lists Versus Arrays (continued)
• In contrast the array type consists of the
actual items.
• This is necessarily a homogenous data
structure.
• It of course has slightly faster access than
a list since only one access is required to
retrieve a value.
28
Illustrating Differences In How
Information is Stored in Arrays
and Lists
29
Using Arrays
•
•
•
•
array('i') - stores integers.
array('f') – stores floats.
Other codes are possible.
Array is not part of built-in types in Python so
must import it using: from array import array.
• Can also initialize array at same time as
creating.
>>> from array import array
>>> yearArray = array('i', [1776, 1789, 1917,1979])
30
Python's Internal Use Of
Dictionaries
• Python uses it's dictionary to keep track of
namespaces.
• Namespaces consist of identifiers in a
particular scope. Scope is where a
variable is known.
• Python has a top-level scope called global.
• The namespace inside of a function is
called a local scope.
31
Python's Namespaces
• globals() – function that displays identifiers
in global namespace.
• locals() – function that displays identifiers
in local namespace.
• When in a function the local namespace
can be accessed. If an identifier is given
that does not exist in the local namespace
then the global scope is accessed.
32
Code Showing Global and Local
Namespaces
def demo(param):
x = len(param)
print 'local dictionary is', locals()
print 'global dictionary is', globals()
y = 'hello'
demo(y)
#output
local dictionary is ('x': 10, 'param': 'hello')
global dictionary is {(will list built-ins and) 'y': 'hello',
'demo': <function demo at 0x37d270>}
33
Global and Local Scopes
• Referring to the previous slide notice that y does
not exist in the local scope but we could print it
out within the function.
• Using variables in the global scope (called a
global variable) is considered to be almost
always bad programming style.
• Assigning to a global variable that is immutable
inside a function will not change its value in the
global scope.
• Assigning to a global variable that is mutable
inside a function will change its value in the
global scope since it uses a reference value.
34
Built-in Module
• Python has a special module __builtins__
• Some words are reserved such as for. These have a
predefined meaning and can not be used as an identifier
for other situations.
• Other words have a predefined meaning but can be used
as an identifier in other situations.
• These other words such as abs or True are contained in
the __builtins__ module.
>>> True = 5
>>> #No longer can True be used in the usual way
>>> #for the current Python section.
• The above is correct because the definition 'True = 5' will
be found before checking for the definition 'True' in the
__builtins__ module.
35
Importing Modules
• One form for importing is:
>>> from math import *
• This brings in all identifiers from math. This is
said to import the namespace.
• This may be a problem when some identifier
in the imported namespace covers over a
wanted identifier.
36
Importing Modules (continue)
• It is better to explicitly import the wanted
methods shown below
>>> from math import sqrt, pi
• Another way to do this is
>>> import math
• To use the sqrt in this case is to do the following
>>> math.sqrt(5)
37
Object-Oriented Name Resolution
• Except for primitive types such as int, str
the state of individual object is stored in an
instance-level dictionary.
• vars() retrieves identifiers and their values.
>>> myTV = Television()
>>> print vars(myTV)
{'_prevChan': 2, '_volume': 5, '_channel': 2,
'_powerOn': False, '_muted': False}
38
Object-Oriented Name Resolution
(continued)
• Notice it does not contain class level identifiers such as
volumeUp( )
• This can be accessed using:
>>> vars(Television)
• When searching for an identifier Python first looks in the
instance namespace, then to the class namespace then
if inheritance is involved it moves up the inheritance
chain.
• This explains why a child class overriding a method in a
base class works, since the child class namespace is
looked at first before the parent class namespace.
39
A Simple Search Engine
• Searching for words in a corpus (multiple
documents).
• Similar to generating an index of a book.
• Can use split( ) to delineate words by white-space.
• Still may have extra characters. Want (Text) to
reduce to text and "document's" to reduce to
document's.
– Strip off non-alphabetic characters at start.
– Strip off non-alphabetic characters at end.
– Make lower case.
40
Code to Reduce to Actual Word
def ourStrip(w):
first = 0 #find first desirable letter
while first < len(w) and not w[first].isalpha():
first += 1
last = len(w) – 1 #find last desirable letter
while last > first and not w[last].isalpha():
last -= 1
return w[first:last+1].lower() #lower case letters
41
Code to Create an Index for a
Single Document
class TextIndex: #manage index of words
def __init__(self, contents, sourceLabel): #make new index
self._lines = contents.split('\n')
self._label= sourceLabel
self._wordAt = { } #build a new dictionary: has line numbers
for linenum in range(len(self._lines)): #where word occurs
words = self._lines[linenum].split()
for w in words: # step through words
w = ourStrip(w) #use our strip method
if w:
#w is not null
if w not in self._wordAt: #word not in dictionary
self._wordAt[w] = [ linenum ] #add to dictionary
elif self._wordAt[w][-1] != linenum: #is linenum there
self._wordAt[w].append(linenum) #if not add line 42
Code to Create anIndex for a
Single Document (continued)
def getLabel(self): #return label for document
return self._label
def getWords(self): #return unordered words list
return self._wordAt.keys()
def getContext(self, word, maxOccur = 10):#returns words
word = ourStrip(word) #occurrences in context
output = [ ]
if word in self._wordAt:
startContext = max(lineNum – 1, 0)
stopContext = min(lineNum + 2, len(self._lines))
output.append('-' * 40)
output.extend(self._lines[startContext:stopContext])
return'\n'.join(output)
43
Code for Search Engine Corpus
class Engine: #supports word occurrences in a collection
def __init__(self): #of text documents.
self.corpus = { } #maps label to index
self._hasWord = { } #maps words to a label
def addDocument(self, contents, sourceLabel):
if sourceLabel not in self._corpus:
newIndex = TextIndex(contents, sourceLabel)
self._corpus[sourceLabel] = newIndex
for word in newIndex.getWords():
if word in self._hasWord:
self._hasWord[word].add(sourceLabel)
else:
self._hasWord[word] = set([sourceLabel])
44
Code for Search Engine Corpus
(continued)
def lookup(self, term):
"""Return set of labels for documents containing search
term"""
term = ourStrip(term)
if term in self._hasWord:
return set (self._hasWord[term])
else:
return set()
def getContext(term, docLabel, maxOccur = 10):
#search for word in a document returning its context
return self._corpus[docLabel].getContext(term,maxOccur )
45
Code for Search Engine Corpus
(continued)
def makeReport(self, term,maxDocuments=10,maxContext=3):
output = [ ] #produce formatted report of occurrences
sources = self.lookup(term) #of term in documents.
num = min(len(sources), maxDocuments)
label = list(sources)[:num]
for docLabel in labels:
output.append('Document: ' + docLabel)
context=self._corpus[docLabel].getContext(term,max
Context)
output.append(context)
output.append(' =' * 40)
return '\n'.join(output)
46
Unit Test for Simple Search Engine
• See book page 429-430 for unit test for
the simple search engine.
47