ppt - Dr. Wissam Fawaz

Download Report

Transcript ppt - Dr. Wissam Fawaz

Regular Expressions
Tokenizing strings

When you read a sentence, your mind breaks it into tokens


String method split breaks a String into


individual words and punctuation marks that convey meaning.
component tokens and returns an array of Strings.
Tokens are separated by delimiters

Typically white-space characters


such as space, tab, newline and carriage return.
Other characters can also be used as delimiters to separate tokens.
Regular expressions

A regular expression

a specially formatted String describing a search pattern

useful for validating input

One application is to construct a compiler

Large and complex regular expression are used to this end

If the program code does not match the regular expression

=> compiler knows that there is a syntax error
Regular Expressions (cont’d)

String method matches receives a String

specifying the regular expression

matches the contents of the String object parameter


and returns a boolean indicating whether


with the regular expression.
the match succeeded.
A regular expression consists of

literal characters and special symbols.
Character classes

A character class

Is an escape sequence representing a group of chars

Matches a single character in the search object
Construct
[abc]
[^abc]
[a-zA-Z]
[a-d[m-p]]
[a-z&&[def]]
[a-z&&[^bc]]
[a-z&&[^m-p]]
Description
a, b, or c (simple class)
Any character except a, b, or c (negation)
a through z, or A through Z, inclusive (range)
a through d, or m through p: [a-dm-p] (union)
d, e, or f (intersection)
a through z, except for b and c: [ad-z] (subtraction)
a through z, and not m through p: [a-lq-z] (subtraction)
Common Matching Symbols
Regular Expression
Description
.
Matches any character
^regex
regex must match at the beginning of the line
regex$
Finds regex must match at the end of the
line
[abc]
Set definition, can match the letter a or b or c
[abc][vz]
Set definition, can match a or b or c followed
by either v or z
[^abc]
When a "^" appears as the first character
inside [] when it negates the pattern. This
can match any character except a or b or c
[a-d1-7]
Ranges, letter between a and d and figures
from 1 to 7, will not match d1
X|Z
Finds X or Z
XZ
Finds X directly followed by Z
$
Checks if a line end follows
Ranges

Ranges in characters are determined

By the letters’ integer values


Ex: "[A-Za-z]" matches all uppercase and lowercase letters.
The range "[A-z]" matches all letters

and also matches those characters (such as [ and \)

with an integer value between uppercase A and lowercase z.
Grouping

Parts of regex can be grouped using “()”

Via the “$”, one can refer to a group

Example:

Removing whitespace between a char and “.” or “,”
String pattern = "(\\w)(\\s+)([\\.,])";
System.out.println(
str.replaceAll(pattern, "$1$3"));
Negative look-ahead

It is

used to exclude a pattern

defined via (?!pattern)

Example: a(?!b)

Matches a if a is not followed by b
Quantifiers
Construct
.
\d
\D
\s
\S
\w
\W
Description
Any character (may or may not match line terminators)
A digit: [0-9]
A non-digit: [^0-9]
A whitespace character: [ \t\n\x0B\f\r]
A non-whitespace character: [^\s]
A word character: [a-zA-Z_0-9]
A non-word character: [^\w]
Matches Method: Examples

Validating a first name


Validating a first name



firstName.matches(“[A-Z][a-zA-Z]*”);
“([a-zA-Z]+|[a-zA-Z]+\\s[a-zA-Z]+)”
The character "|" matches the expression

to its left or to its right.

"Hi (John|Jane)" matches both "Hi John" and "Hi Jane".
Validating a Zip code

“\\d{5}”;
Split Method: examples
public class RegexTestStrings {
public static final String EXAMPLE_TEST =
"This is my small example " + "string which
"use for pattern matching.";
I'm going to " +
public static void main(String[] args) {
System.out.println(EXAMPLE_TEST.matches("\\w.*"));
String[] splitString = (EXAMPLE_TEST.split("\\s+"));
System.out.println(splitString.length);// Should be 14
for (String string : splitString) {
System.out.println(string);
} // Replace all whitespace with tabs
System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "\t"));
}
}
RegEx examples




// Returns true if the string matches exactly "true"
public boolean isTrue(String s){
return s.matches("true"); }
// Returns true if the string matches exactly "true" or "True“
public boolean isTrueVersion2(String s){
return s.matches("[tT]rue"); }
// Returns true if the string matches exactly "true" or "True"
// or "yes" or "Yes"
public boolean isTrueOrYes(String s){
return s.matches("[tT]rue|[yY]es"); }
// Returns true if the string contains exactly "true"
public boolean containsTrue(String s){
return s.matches(".*true.*"); }
RegEx examples (cont’d)

// Returns true if the string consists of three letters
public boolean isThreeLetters(String s){
return s.matches("[a-zA-Z]{3}");}

// Returns true if the string does not have a number at the beginning
public boolean isNoNumberAtBeginning(String s){
return s.matches("^[^\\d].*"); }

// Returns true if the string contains arbitrary number of characters
//except b
public boolean isIntersection(String s){
return s.matches("([\\w&&[^b]])*"); }
Pattern and Matcher classes

Java provides java.util.regex

That helps developers manipulate regular expressions

Class Pattern represents a regular expression

Class Matcher


Contains a search pattern and a CharSequence object
If regular expression to be used once

Use static method matches of Pattern class, which


Accepts a regular expression and a search object
And returns a boolean value
Pattern and Matcher classes
(cont’d)

If a regular expression is used more than once

Use static method compile of Pattern to


Use the resulting Pattern object to


Create a specific Pattern object based on a regular expression
Call the method matcher, which
 Receives a CharSequence to search and returns a Matcher
Finally, use the following methods of the obtained Matcher

find, group, lookingAt, replaceFirst, and replaceAll
Methods of Matcher

The dot character "." in a regular expression


Matcher method find attempts to match



matches any single character except a newline character.
a piece of the search object to the search pattern.
each call to this method starts at the point where the last call
ended, so multiple matches can be found.
Matcher method lookingAt performs the same way


except that it starts from the beginning of the search object
and will always find the first match if there is one.
Pattern and Matcher example
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTestPatternMatcher {
public static final String EXAMPLE_TEST =
"This is my small example string which I'm going to
use for pattern matching.";
public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher =
pattern.matcher(EXAMPLE_TEST);
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end());
System.out.println(matcher.group()); }
Pattern replace = Pattern.compile("\\s+");
Matcher matcher2 = replace.matcher(EXAMPLE_TEST);
System.out.println(matcher2.replaceAll("\t"));
}
}
Appendix
More examples of Regular
Expressions in Java
Validating a username
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class UsernameValidator{
private Pattern pattern; private Matcher matcher;
private static final String USERNAME_PATTERN = "^[a-z0-9_-]{3,15}$";
public UsernameValidator(){
pattern = Pattern.compile(USERNAME_PATTERN); }
/** * Validate username with regular expression *
@param username username for validation *
@return true valid username, false invalid username */
public boolean validate(final String username){
matcher = pattern.matcher(username);
return matcher.matches();
}
}

Examples of usernames that don’t match
 mk (too short, min 3 chars); w@lau (“@” not allowed)
Validating image file extension
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ImageValidator{
private Pattern pattern;
private Matcher matcher;
private static final String IMAGE_PATTERN ="([^\\s]+(\\.(?i)(jpg|png|gif|bmp))$)";
public ImageValidator(){
pattern = Pattern.compile(IMAGE_PATTERN);
}
/** * Validate image with regular expression *
@param image image for validation *
@return true valid image, false invalid image */
public boolean validate(final String image){
matcher = pattern.matcher(image); return matcher.matches();
}
}
Time in 12 Hours Format
validator
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Time12HoursValidator{
private Pattern pattern; private Matcher matcher;
private static final String TIME12HOURS_PATTERN =
"(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)";
public Time12HoursValidator(){
pattern = Pattern.compile(TIME12HOURS_PATTERN); }
/** * Validate time in 12 hours format with regular expression *
@param time time address for validation *
@return true valid time fromat, false invalid time format */
public boolean validate(final String time){
matcher = pattern.matcher(time); return matcher.matches();
}
}
Validating date

Date format validation

(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/((19|20)\\d\\d)
( start of group #1
0?[1-9] => 01-09 or 1-9
| ..or
[12][0-9] # 10-19 or 20-29
| ..or
3[01] => 30, 31
) end of group #1
/ # followed by a "/"
( # start of group #2
0?[1-9] # 01-09 or 1-9
| # ..or
1[012] # 10,11,12
) # end of group #2
/ # followed by a "/"
( # start of group #3
(19|20)\\d\\d # 19[0-9][0-9] or 20[0-9][0-9]
) # end of group #3