PythonRegExpr

Download Report

Transcript PythonRegExpr

CSC1018F:
Regular Expressions
Diving into Python Ch. 7
Number Systems
Lecture Outline
Recap of OO Python [week 3]
Regular Expressions
Standard
Verbose
Number Systems
Binary, decimal, hexadecimal
Recap of OO Python
Object Orientation:
Module importing
Defining, initializing and instantiating
Classes
Class attributes
Class methods
Exceptions
File Handling:
Opening, reading, writing and closing
Intro to Regular Expressions
Regular expressions are a powerful means for
parsing text to identify complex patterns of
characters
Standard string methods (find, replace, split) can be
insufficient in complex cases
But regular expressions can be complicated and
difficult to read so avoid them if string methods will
do the job
Read regular expressions from left to right
Usage:
Import re # regular expression functionality in re module
Re.sub(regexpr, repstr, inputstr) # typical search & replace
Format of Regular Expressions
Syntax:
$ - end of string marker
^ - start of string marker
\b - word boundary marker (to avoid backslash escapes
use a raw string - r"stringcontents")
? - optional match to a single character
(A|B|C) - indicates mutually exclusive options A, B and C
Examples:
re.sub(r"\bROAD$", "RD.", addr)
addr: 60 BROAD ROAD  60 BROAD RD.
re.search(r"^(a|b|c) -", question)
question: a - how are you?  <SRE_Match object …>
Further Syntax
P{n, m} syntax:
Deals with repeating patterns
Read as pattern P appears at least n times but no more
than m times
More syntax:
\d - any numeric digit
\D - any character except a numeric digit
+ - 1 or more
* - 0 or more
( ) - to indicate groups
Examples:
>>> phPat = re.compile(r"^(\d{3})\D*(\d{7})$")
>>> phPat.search(“021 6504058”).groups()
(‘021’, ‘6504058’)
Verbose Regular Expressions
So far only compact regular expressions
To aid readability we would like to include
comments and spaces
Use re.VERBOSE as the last arguments to re
functions
Whitespace is ignored
Comments ( # commentstr) are ignored
Example:
pattern = """
"""
^
# beginning of string
$
# end of string
Case Study
Counting 1-10 in roman numerals
Additive and subtractive combination of I (=1), V(=5), X
(=10)
Can have at most 3 of a particular numeral in a row
>>> roman = r"^(I?X|IV|V?I{0,3})$"
>>> re.search(roman, "X")
<_sre.SRE_Match object at 0x1e55be0>
>>> re.search(roman, "VIII")
<_sre.SRE_Match object at 0x1e55ba0>
>>> re.search(roman, "")
<_sre.SRE_Match object at 0x1e55ce0>
>>> re.search(roman, "IIII") == None
True
Number Systems
Decimal (base 10)
Digits (0-9)
Each place represents a power of ten
172 = 2*100 + 7*101 + 1*102 = 172
Binary (base 2)
Digits (0,1)
Each place represents a power of two
10011 = 1*20 + 1*21 + 0* 22 + 0* 23 + 1* 24 = 19
Hexadecimal (base 16)
Digits (0-9, A-F)
A-F represent 10-15
Each place represents a power of sixteen
E.g., F7A = 10*160 + 7* 161 + 15* 162 = 3962
Conversion
Decimal to others
Repeatedly divide number by base and populate places
from right to left with the remainder
E.g. Dec2Bin: 50 / 2 [% = 0] = 25 / 2 [% = 1] = 12 / 2 [% = 0] =
6 / 2 [% = 0] = 3 / 2 [% = 1] = 1 / 2 [% = 1] = 0 [110010]
Bin2Hex:
Collect binary digits into groups of four and convert
E.g., 111000011111 = 1110 0001 1111 = E1F
Hex2Bin
Hexadecimal digits convert into groups of four binary digits
E.g., A7C = 1010 0111 1100 = 101001111100
Hex is used because:
It is easy to convert to and from binary
Offers a more compact representation
Revision Exercise
Create a function which will take a date string in
any one of the following formats:
dd/mm/yyyy or dd/mm/yy
Other separators (e.g., ‘\’, ‘ ‘, ‘-’) are also allowed
Single figure entries may have the form x or 0x, e.g. 3/4/5 or
03/04/05
dd month yy or yyyy where month may be written in full
(December) or abbreviated (Dec. or Dec)
And return it in the format:
dd month(in full) yyyy, e.g. 13 March 2006
Implement this using regular expressions and also
implement range checking on dates