Python Code Examples
Download
Report
Transcript Python Code Examples
Python Code Examples
Word Spotting
import sys
fname1 = "c:\Python Course\ex1.txt"
for line in
open(fname1,'r').readlines():
for word in line.split():
if word.endswith('ing'):
print word
Creating a Dictionary of First Names
def createNameDict():
dictNameFile=open('project/dictionaries/names.txt','r')
dictContent=dictNameFile.read() #read all the file
dictWords=dictContent.split(",")
#return a list with
the words
nameDict={}
# initialize a dictionary
for word in dictWords:
nameDict[word.strip()]=" " #enters each word to the
dctionary.
return nameDict
Computing Accuracy Results I
# anfiles.py
# Program to analyze the results of speaker identification.
# Illustrates Python dictionarys
import string, glob, sys
def main():
# read correct file and test file
fname1 = sys.argv[1]
fname2 = sys.argv[2]
text1 = open(fname1,'r').read()
text1 = string.lower(text1)
words1 = string.split(text1)
correct_len = len(words1)
text2 = open(fname2,'r').read()
text2 = string.lower(text2)
words2 = string.split(text2)
Computing Accuracy Results II
# construct a dictionary of correct results
correct = {}
for w in words1:
correct[w] = 1
for i in range(correct_len):
in_count = 0
portion2 = words2[:i+1]
for w in portion2:
if correct.get(w,0) > 0:
in_count+=1
accuracy = float(in_count)/float(len(portion2))
print "%5d, %5d,%.2f" % (len(portion2), in_count, accuracy)
if __name__ == '__main__': main()
Word Histograms
import sre, string
pattern = sre.compile( r'[a-zA-Z]+' )
def countwords(text):
dict = {}
try:
iterator = pattern.finditer(text)
for match in iterator:
word = match.group()
try:
dict[word] = dict[word] + 1
except KeyError:
dict[word] = 1
except sre.error:
pass # triggers when first index goes to -1, terminates loop.
Word Histograms
items = []
for word in dict.keys():
items.append( (dict[word], word) )
items.sort()
items.reverse()
return items
# if run as a script, count words in stdin.
if __name__ == "__main__":
import sys
x = countwords( sys.stdin.read() )
s = map(str, x)
t = string.joinfields(s, "\n")
print t
Extracting People Names and
Company Names
import string, sre, glob, sys
def createNameDict():
dictNameFile=open('names.txt','r')
dictContent=dictNameFile.read() #read all the file
dictWords=dictContent.split(",") #return a list with the words
nameDict={}
# initialize a dictionary
for word in dictWords:
nameDict[word.strip()]=" "
#enters each word to the dctionary.
return nameDict
def main():
# read file
fname1 = sys.argv[1]
text1 = open(fname1,'r').read()
namesDic = createNameDict()
CompanySuffix = sre.compile(r'corp | ltd | inc | corporation | gmbh | ag | sa ',
sre.IGNORECASE)
pattern = sre.compile( r'([A-Z]\w+[ .,-]+)+'
Extracting People Names and
Company Names
r'(corp|CORP|Corp|ltd|Ltd|LTD|inc|Inc|INC|corporation|Corporation|CORPORATION|gmbh|G
MBH|ag|AG|sa|SA)'
r'(\.?)')
pattern1 = sre.compile( r'([A-Z]\w+[\s.-]*){2,4}' )
#Companies
capitalWords=sre.finditer(pattern,text1)
for match in capitalWords:
CapSeq = match.group()
print CapSeq
#People
capitalWords1=sre.finditer(pattern1,text1)
for match in capitalWords1:
wordList=match.group().split()
#check name in names dictionary
if namesDic.has_key(wordList[0].strip()):
print match.group()
if __name__ == '__main__': main()
NLTK
NLTK defines a basic infrastructure that can be used
to build NLP programs in Python. It provides:
Basic classes for representing data relevant to natural
language processing.
Standard interfaces for performing tasks, such as
tokenization, tagging, and parsing.
Standard implementations for each task, which can be
combined to solve complex problems.
Extensive documentation, including tutorials and
reference documentation.
RE Show
>>> from nltk.util import re_show
>>> string = """
... It’s probably worth paying a premium for funds
that invest in markets
... that are partially closed to foreign investors, such
as South Korea, ...
... """
>>> re_show(’t...’, string)
I{t’s }probably wor{th p}aying a premium for funds {that}
inves{t in} markets {that} are par{tial}ly closed {to
f}oreign inves{tors}, such as Sou{th K}orea, ...
>>>
Classes in Python
Defining Classes
>>> class SimpleClass:
... def __init__(self, initial_value):
...
self.data = initial_value
... def set(self, value):
...
self.data = value
... def get(self):
...
print self.data
...
>>> x = SimpleClass(4)
Inheritance
B is a subclass of A
>>> class B(A):
... def __init__(self):
SimpleTokenizer implements the interface of TokenizerI
>>> class SimpleTokenizer(TokenizerI):
... def tokenize(self, str):
...
words = str.split()
...
return [Token(words[i], Location(i))
...
for i in range(len(words))]
Inheritance Example
class point:
def __init__(self, x=0, y=0):
self.x, self.y = x, y
class cartesian(point):
def distanceToOrigin(self):
return floor(sqrt(self.x**2 + self.y**2))
class manhattan(point):
def distanceToOrigin(self):
return self.x + self.y
Sets
Sets in Python
The sets module provides classes for constructing
and manipulating unordered collections of unique
elements. Common uses include:
membership testing,
removing duplicates from a sequence,
and computing standard math operations on sets such
as intersection, union, difference, and symmetric
difference.
Like other collections, sets support x in set, len(set),
and for x in set. Being an unordered collection, sets
do not record element position or order of insertion.
Accordingly, sets do not support indexing, slicing, or
other sequence-like behavior.
Some Details about Implementation
Most set applications use the Set class which
provides every set method except for __hash__().
For advanced applications requiring a hash method,
the ImmutableSet class adds a __hash__() method
but omits methods which alter the contents of the set.
The set classes are implemented using dictionaries.
As a result, sets cannot contain mutable elements
such as lists or dictionaries.
However, they can contain immutable collections
such as tuples or instances of ImmutableSet.
For convenience in implementing sets of sets, inner
sets are automatically converted to immutable form,
for example, Set([Set(['dog'])]) is transformed to
Set([ImmutableSet(['dog'])]).
Set Operations
Operation
Equivalent
len(s)
cardinality of set s
x in s
test x for membership in s
x not in s
s.issubset(t)
s.issuperset(t)
s.union(t)
s.intersection(t)
s.difference(t)
s.symmetric_differenc
e(t)
s.copy()
Result
test x for non-membership in s
s <= t
test whether every element in s is in t
s >= t
test whether every element in t is in s
s|t
new set with elements from both s and t
s&t
new set with elements common to s and t
s-t
new set with elements in s but not in t
s^t
new set with elements in either s or t but
not both
new set with a shallow copy of s
Operations not for ImmutableSet
Operation
Equivalent
s.union_update(
t)
s |= t
return set s with elements added from t
s.intersection_u
pdate(t)
s &= t
return set s keeping only elements also
found in t
s.difference_up
date(t)
s -= t
return set s after removing elements found
in t
s ^= t
return set s with elements from s or t but not
both
s.symmetric_dif
ference_up
date(t)
s.add(x)
s.remove(x)
s.discard(x)
s.pop()
Result
add element x to set s
remove x from set s; raises KeyError if not
present
removes x from set s if present
remove and return an arbitrary element from
s; raises KeyError if empty
Set Examples
>>> from sets import Set
>>> engineers = Set(['John', 'Jane', 'Jack', 'Janice'])
>>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice'])
>>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack'])
>>> employees = engineers | programmers | managers
#
union
>>> engineering_management = engineers & managers
#
intersection
>>> fulltime_management = managers - engineers - programmers
# difference
>>> engineers.add('Marvin')
# add element
>>> print engineers
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
>>> employees.issuperset(engineers)
# superset test
False
Set Examples
>>> employees.union_update(engineers)
# update from
another set
>>> employees.issuperset(engineers)
True
>>> for group in [engineers, programmers, managers, employees]:
... group.discard('Susan')
# unconditionally remove
element
... print group
...
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
Set(['Janice', 'Jack', 'Sam'])
Set(['Jane', 'Zack', 'Jack'])
Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
Google API
Get it from
http://sourceforge.net/projects/pygoogle/
A Python wrapper for the Google web API.
Allows you to do Google searches, retrieve
pages from the Google cache, and ask
Google for spelling suggestions.
Utilizing the Google API - I
import sys
import string
import codecs
import google
print '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">'
print '<head>'
print ' <title>Google with Python</title>'
print '</head>'
print '<body>'
print '<h1>Google with Python</h1>'
google.LICENSE_KEY = '[YOUR GOOGLE LICENSE KEY]'
sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
query = “Your Query"
data = google.doGoogleSearch(query)
Utilizing the Google API - II
print '<p><strong>1-10 of "' + query + '" total results for '
print str(data.meta.estimatedTotalResultsCount) + '</strong></p>'
for result in data.results:
title = result.title
title = title.replace('<b>', '<strong>')
title = title.replace('</b>', '</strong>')
snippet = result.snippet
snippet = snippet.replace('<b>','<strong>')
snippet = snippet.replace('</b>','</strong>')
snippet = snippet.replace('<br>','<br />')
print '<h2><a href="' + result.URL + '">' + title + '</a></h2>'
print '<p>' + snippet + '</p>'
print '</body>‘
print '</html>'
Yahoo API
http://pysearch.sourceforge.net/
http://python.codezoo.com/pub/component/41
93?category=198
This project implements a Python API for the
Yahoo Search Webservices API. pYsearch is
an OO abstraction of the web services, with
emphasis on ease of use and extensibility.
URLLIB
This module provides a high-level interface
for fetching data across the World Wide Web.
In particular, the urlopen() function is similar
to the built-in function open(), but accepts
Universal Resource Locators (URLs) instead
of filenames.
Some restrictions apply -- it can only open
URLs for reading, and no seek operations are
available.
Urllib Syntax
# Use http://www.someproxy.com:3128 for http
proxying
proxies = {'http': 'http://www.someproxy.com:3128'}
filehandle = urllib.urlopen(some_url,
proxies=proxies)
# Don't use any proxies
filehandle = urllib.urlopen(some_url, proxies={})
URLLIB Examples
Here is an example session that uses the "GET" method to
retrieve a URL containing parameters:
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s"
% params)
>>> print f.read()
The following example uses the "POST" method instead:
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query",
params)
>>> print f.read()
What is a Proxy
A proxy server is a computer that offers a computer
network service to allow clients to make indirect
network connections to other network services.
A client connects to the proxy server, then requests a
connection, file, or other resource available on a
different server.
The proxy provides the resource either by connecting
to the specified server or by serving it from a cache.
In some cases, the proxy may alter the client's
request or the server's response for various
purposes.
A proxy server can also serve as a firewall.