Regular Expression

Download Report

Transcript Regular Expression

Regular Expressions
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0
Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View,
California, 94041, USA.
Outline and Objectives
Outline
• Motivation
• Regular expressions
– Regular Ops.
– Formal Definition
• Examples
• Applications
Learning Objectives
• Evaluate regular
expressions.
• Construct regular
expressions for simple
languages.
• Describe regular
operations.
Motivation
Task: Remove comments and white space from
a text file. Assume comments start with two
forward slashes (ie, //)
Python:
Perl:
line = re.sub(“//.*$”, “”, line)
$line =~ s/\/\/.*$//;
Regular Expressions
Evaluating Expression
Arithmetic Expression
Regular Expression
(5 + 3) * 2
01*0
• This evaluates to a number:
16
• Makes use of standard
mathematical operations
• This evaluates to a
language: A string that
starts and ends with ‘0’ and
has a continuous string of
zero or more ‘1’s
• Makes use of regular
operations
Regular Operations
Regular operations operate on languages…
Let X and Y be regular languages, then following
are regular operations. The result of each op.
can be shown to be a regular language.
Concatenation: X•Y = { xy | x ε X AND y ε Y }
Union: X U Y = { x | x ε X OR y ε Y }
Star: X* = {x1x2x3..xi | i ≥ 0 and xi ε X }
(Sipser, 44)
Regular Languages
R is a regular expression if R is any of the
following…
•
•
•
•
•
x for some x in an alphabet (set of valid chars.)
ε or ∅ ε is a language with empty string; ∅ is an empty language
(R1 U R2), where R1 and R2 are regular languages
(R1 • R2), where R1 and R2 are regular languages
(R1)*, where R1 is a regular language
(Sipser, 64)
Examples
Assume the valid set of characters is 0 & 1. Describe or explicitly
state the language represented by each expression.
(0 U 1)*0 =
(1 U ε)0 =
(0 U 1)1(0 U 1) =
{ s | s ends in a 0}
{ 10, 0}
{ s | s contains 3 digits w/ the
1 in the middle }
Examples
Assume the valid set of characters is 0 & 1. Provide a regular
expression the evaluates to the language described.
- A language that starts with 0 = 0(0 U 1)(0 U 1)1
and ends with 1 and is 4
characters long.
- A language that accepts any = (0 U 1)(0 U 1)*
string containing a
combination of 0’s or 1’s that
has at least one character.
- A string that has an even
= (00)*
pair of 0’s and no 1’s.
Link to FSM
Regular expressions are equivalent to FSM. The
idea is to construct a FSM that will accept any
string from the language the RE represents.
a
ε
(Sipser, 66)
Link to FSM
ab
aUb
(ab)*
Applications
You’ll typically see regular expressions used for
pattern matching.
When working with text strings, there are
number of language specific extensions that
make matching strings easier
Symbol
Meaning
Symbol
Meaning
^
Start of line
[]
Group of characters
$
End of line
[^ ]
Group of characters to exclude
+
One or more
.
Any valid character
Examples
Provide a regular expression to match the following.
- White space at the
beginning of a line
- The word ‘add’
- Any number
- Any number with
punctuation
- Whole line
^[ \t]*
add[ \t\n]
[0-9]+
[0-9.,]+
^.*$
You may need to
escape the . to
match a period
instead of any
character.
References
Sipser M: Introduction to the theory of
computation. 2nd ed. Boston: Thomson
Course Technology; 2006.