Characters and Strings

Download Report

Transcript Characters and Strings

Characters and Strings
Representation of single characters
• Data type char is the data type that
represents single characters, such as
letters, numerals, and punctuation marks
• A literal value of type char is written as a
single character enclosed within single
quotation marks
• Examples:
‘a’, ‘F’, ‘9’, ‘&’, ‘ ’, ‘,’
Character encoding
• ASCII stands for American Standard Code
for Information Interchange.
• ASCII is one of the document coding
schemes widely used today. This coding
scheme allows different computers to share
information easily.
• Most programming languages support
ASCII characters
ASCII Encoding
• ASCII works well for English-language
documents because all characters and
punctuation marks are included in the
ASCII codes.
• ASCII does not represent the full character
sets of other languages.
ASCII Encoding
9
70
O
For example,
character 'O' is
79 (row value
70 + col value 9
= 79).
Limitations of ASCII
• ASCII uses 8 bits to represent a single
character
– One bit is reserved for the sign in standard
ASCII
– This leaves 27 (128) unique combinations of
bits to represent characters
– The extended ASCII set uses all 8 bits to
represent a character, given 256 unique
combinations
Unicode Encoding
• The Unicode Worldwide Character Standard
(Unicode) supports the interchange, processing,
and display of the written texts of diverse
languages.
• Java uses the Unicode standard for representing
char constants.
• Each Unicode character occupies 16 bits,
allowing for the possibility of 216 (65,536) unique
bit combinations
• Currently 34,168 distinct characters are defined,
covering most of the major world languages
ASCII/Unicode equivalence
• Unicode uses the same bit combinations
for the characters that exist in the ASCII
set
• Thus, an English alphabetic character has
the same numeric value in both ASCII and
Unicode
Special characters
• Several keys on a standard keyboard don’t
translate directly into printable (or
displayable) characters
• For example, the Enter key moves the
cursor to a new line; we already know that
the character that corresponds to this
action can be represented as ‘\n’
Special characters
• Some other special characters used in
Java include:
– ‘\t’: horizontal tab character
– ‘\a’: alarm “character” – causes system
speaker to beep
– ‘\\’: a single backslash
Converting between char and int
We can convert between a numeric (int) value
and its corresponding ASCII character
equivalent by using type casting, as the
examples below illustrate:
int x = 99;
System.out.println(x);
System.out.println( (char) x);
char ch1 = 'X';
System.out.println(ch1);
System.out.println( (int) ch1);
// prints 99
// prints c
X
88
Character comparison
• Values of type char can be compared just
like integers are compared, since they are
actually stored as binary whole numbers
• In the ASCII (and Unicode) set, uppercase
letters have lower numeric value than
lowercase letters
• So, for example, ‘A’ is less than ‘a’, and ‘b’
is greater than ‘Z’
Strings
• A string is a sequence of characters that is
treated as a single value.
• Instances of the String class are used to
represent strings in Java.
• We access individual characters of a string
by calling the charAt method of the String
object.
Strings
• Each character in a string has an index we
use to access the character.
• Java uses zero-based indexing; the first
character’s index is 0, the second is 1, and
so on.
• To refer to the first character of the word
name, we say
name.charAt(0)
String indexing with charAt method
• An indexed expression is used to refer to
individual characters in a string.
Constructing strings
• Since String is a class, we can create an
instance of a class by using the new
method.
– The statements we have used so far, such as
String name1 = “Kona”;
– works as a shorthand for
String name1 = new String(“Kona”);
– But this shorthand works for the String class
only.
Example: Counting Vowels
char
letter;
String
name = JOptionPane.showInputDialog(null,"Your name:");
int
numberOfCharacters = name.length();
int
vowelCount = 0;
for (int i = 0; i < numberOfCharacters; i++) {
letter = name.charAt(i);
Here’s the code to
count the number of
vowels in the input
string.
if (letter == 'a' || letter == 'A' || letter == 'e' || letter == 'E' ||
letter == 'i' || letter == 'I' || letter == 'o' || letter == 'O' ||
letter == 'u' || letter == 'U' ) {
vowelCount++;
}
}
System.out.print(name + ", your name has " + vowelCount + " vowels");
Example: Counting ‘Java’
int
javaCount
= 0;
boolean
repeat
= true;
String
word;
Continue reading words
and count how many times
the word Java occurs in the
input, ignoring the case.
while ( repeat ) {
word = JOptionPane.showInputDialog(null,"Next word:");
if ( word.equals("STOP") )
repeat = false;
{
Notice how the comparison
is done. We are not using
the == operator.
} else if ( word.equalsIgnoreCase("Java") ) {
javaCount++;
}
}
Other Useful String Operators
Method
compareTo
substring
trim
valueOf
Meaning
Compares the two strings.
str1.compareTo( str2 )
Extracts the a substring from a string.
str1.substring( 1, 4 )
Removes the leading and trailing spaces.
str1.trim( )
Converts a given primitive data value to a string.
String.valueOf( 123.4565 )
startsWith
Returns true if a string starts with a specified prefix
string.
str1.startsWith( str2 )
endsWith
Returns true if a string ends with a specified suffix
string.
str1.endsWith( str2 )
Comparing Strings
• Comparing String objects is similar to
comparing other objects.
• The equality test (==) is true if the contents of
the variables are the same.
• For a reference data type, the equality test is
true if both variables refer to the same object,
because they both contain the same address.
Thus, the “contents of the variable” does not
mean “the sequence of characters in the String”
Comparing Strings
• We don’t usually use the == operator to
compare Strings
• The equals method is true if the String
objects to which the two variables refer
contain the same string value.
String s1 = new String (“hello”);
String s2 = new String (“hello”);
if (s1 == s2)
System.out.println (“They are equal”);
if (s1.equals(s2))
System.out.println (“No, really, they are”);
// this won’t print
// this will print
The difference between the equality
test and the equals method
… continued
Comparing Strings
• String comparison may be done in several
ways.
– The methods equals and equalsIgnoreCase
compare string values; one is case-sensitive and one
is not.
– The method compareTo returns a value:
• Zero (0) if the strings are equal.
• A negative integer if the first string is less than the
second.
• A positive integer if the first string is greater than
the second.
Comparing Strings
• As long as a new String object is created
using the new operator, the rule for
comparing objects applies to comparing
strings.
String str = new String (“Java”);
• If the new operator is not used, string data
are treated as if they are of the primitive
data type.
String str = “Java”;
The difference between using and not
using the new operator for String
Pattern Matching
and Regular Expressions
• Pattern matching is a common function in
many applications.
• In Java 2 SDK 1.4, two new classes,
Pattern and Matcher, are added.
• The String class also includes several
new methods that support pattern
matching.
Pattern Example
• Suppose students are assigned a three-digit code:
– The first digit represents the major (5 indicates computer science);
– The second digit represents either in-state (1), out-of-state (2), or
foreign (3);
– The third digit indicates campus housing:
• On-campus dorms are numbered 1-7.
• Students living off-campus are represented by the digit 8.
The 3-digit pattern to represent computer science majors living on-campus is
5[123][1-7]
first
character
is 5
second
character
is 1, 2, or 3
third
character
is any digit
between 1 and 7
Pattern Matching
and Regular Expression
• The pattern is called a regular expression
that allows us to denote a large set of
“words” (any sequence of symbols)
succinctly.
• Brackets [ ] represent choices, so [abc]
means a, b, or c.
• For example, the definition for a valid Java
identifier may be stated as
[a-zA-Z][a-zA-Z0-9_$]*
Regular Expressions
• Rules
– The brackets [ ] represent choices
– The asterisk symbol * means zero or more
occurrences.
– The plus symbol + means one or more occurrences.
– The hat symbol ^ means negation.
– The hyphen – means ranges.
– The parentheses ( ) and the vertical bar | mean a
range of choices for multiple characters.
Regular Expression Examples
Expression
[013]
Description
[0-9][0-9]
Any two-digit number from 00 to 99.
[0-9&&[^4567]]
A single digit that is 0, 1, 2, 3, 8, or 9.
[a-z0-9]
A single character that is either a lowercase
letter or a digit.
[a-zA-z][a-zA-Z0-9_$]*
A valid Java identifier consisting of
alphanumeric characters, underscores, and
dollar signs, with the first character being
an alphabet.
[wb](ad|eed)
Matches wad, weed, bad, and beed.
(AZ|CA|CO)[0-9][0-9]
Matches AZxx,CAxx, and COxx, where x is
a single digit.
A single digit 0, 1, or 3.
More Examples
Expression
Description
X{N}
Repeat X exactly N times, where X
is a regular expression for a single
character.
Repeat X at least N times.
X{N,}
X{N,M}
Repeat X at least N but no more
than M times.
Pattern Matching
and Regular Expression
• The matches method from the String
class is similar to the equals method.
• However, unlike equals, the argument to
matches can be a pattern.
Pattern Matching
and Regular Expression
• The period symbol (.) is used to match any
character except a line terminator (\n or \r).
String document;
document = ...;
//assign text to ‘document’
if (document.matches(“.*zen of objects.*”){
System.out.println(“Found”);
} else {
System.out.println(“Not found”);
}
Pattern Matching
and Regular Expression
• Brackets ([ ]) are used for expressing a
range of choices for a given character.
• To express a range of choices for multiple
characters, use parentheses and the
vertical bar.
Pattern Matching
and Regular Expression
Expression
Description
[wb](ad|eed)
Matches wad, weed,
bad, and beed.
(pro|anti)-OOP
Matches pro-OOP and
anti-OOP
(AZ|CA|CO)[0-9]{4}
Matches AZxxxx,CAxxxx,
and COxxxx, where x is a
single digit.
Pattern Matching
and Regular Expression
• The replaceAll method is new to the
Version 1.4 String class.
• This method allows us to replace all
occurrences of a substring that matches a
given regular expression with a given
replacement string.
Pattern Matching
and Regular Expression
• For example, to replace all vowels in a string with
the @ symbol:
String originalText, modifiedText;
originalText = ...;
//assign string to ‘originalText’
modifiedText = originalText.replaceAll(“[aeiou]”,”@”);
• Note that this method does not change the
original text; it simply returns a modified text as a
separate string.
Pattern Matching
and Regular Expression
• To match a whole word, use the \b symbol to
designate the word boundary.
str.replaceAll(“\\btemp\\b”, “temporary”);
• Two backslashes are necessary because we
must write the expression in a String
representation. Two backslashes prevents the
system from interpreting the regular expression
backslash as a control character.
Pattern Matching
and Regular Expression
• The backslash is also used to search for a
command character. For example:
– To search for the plus symbol (+) in text, we use the
backslash as \+.
– To express it as a string, we write “\\+”.
The Pattern and Matcher Classes
• The matches and replaceAll methods of
the String class are shorthand for using
the Pattern and Matcher classes from the
java.util.regex package.
The Pattern and Matcher Classes
• If str and regex are String objects,
then both
str.matches(regex);
and
Pattern.matches(regex, str);
are equivalent to
Pattern pattern = Pattern.compile(regex);
Matcher matcher = p.matcher(str);
matcher.matches();
The Pattern and Matcher Classes
• Creating Pattern and Matcher objects gives
us more options and efficiency.
• The compile method of the Pattern class
converts the stated regular expression to an
internal format to carry out the patternmatching operation.
• This conversion is carried out every time the
matches method of the String or Pattern
class is executed.
The Pattern and Matcher Classes
/* Chapter 9 Sample Program: Checks whether the input string is
a valid identifier. This version uses the Matcher and Pattern
classes.
File: Ch9MatchJavaIdentifier2.java */
import javax.swing.*;
import java.util.regex.*;
class Ch9MatchJavaIdentifier2 {
private static final String STOP = STOP";
private static final String VALID ="Valid Java identifier";
private static final String INVALID ="Not a valid Java identifier";
The Pattern and Matcher Classes
private static final String VALID_IDENTIFIER_PATTERN =
"[a-zA-Z][a-zA-Z0-9_$]*";
public static void main (String[] args) {
String str, reply;
Matcher matcher;
Pattern pattern =
Pattern.compile(VALID_IDENTIFIER_PATTERN);
while (true) {
str = JOptionPane.showInputDialog
null, "Identifier:");
if (str.equals(STOP)) break;
The Pattern and Matcher Classes
matcher = pattern.matcher(str);
if (matcher.matches()) {
reply = VALID;
} else {
reply = INVALID;
}
JOptionPane.showMessageDialog(null,
str + ":\n" + reply);
} // ends loop
} // ends main
} // ends class
The Pattern and Matcher Classes
• The find method is another powerful
method of the Matcher class.
• The method searches for the next
sequence in a string that matches the
pattern, and returns true if the pattern is
found.
The Pattern and Matcher Classes
• When a matcher finds a matching
sequence of characters, we can query the
location of the sequence by using the start
and end methods.
The Pattern and Matcher Classes
• The start method returns the position in
the string where the first character of the
pattern is found.
• The end method returns the value 1 more
than the position in the string where the
last character of the pattern is found.
The String Class is Immutable
• In Java a String object is immutable
– This means once a String object is created, it cannot
be changed, such as replacing a character with
another character or removing a character
– The String methods we have used so far do not
change the original string. They created a new string
from the original. For example, substring creates a
new string from a given string.
• The String class is defined in this manner for
efficiency reasons.
Effect of Immutability
We can do this
because String
objects are
immutable.
The StringBuffer Class
• In many string processing applications, we
would like to change the contents of a string. In
other words, we want it to be mutable.
• Manipulating the content of a string, such as
replacing a character, appending a string with
another string, deleting a portion of a string, and
so on, may be accomplished by using the
StringBuffer class.
StringBuffer Example
StringBuffer word = new StringBuffer("Java");
word.setCharAt(0, 'D');
word.setCharAt(1, 'i');
word
Changing a string
Java to Diva
word
: StringBuffer
Java
Before
word.setCharAt(0, 'D');
word.setCharAt(1, 'i');
: StringBuffer
Diva
After
Sample Processing
Replace all vowels in the sentence with ‘X’.
char
letter;
String
inSentence = JOptionPane.showInputDialog(null, "Sentence:");
StringBuffer tempStringBuffer = new StringBuffer(inSentence);
int
numberOfCharacters = tempStringBuffer.length();
for (int index = 0; index < numberOfCharacters; index++) {
letter = tempStringBuffer.charAt(index);
if ( letter == 'a' || letter == 'A' || letter == 'e' || letter == 'E' ||
letter == 'i' || letter == 'I' || letter == 'o' || letter == 'O' ||
letter == 'u' || letter == 'U' ) {
tempStringBuffer.setCharAt(index,'X');
}
}
JOptionPane.showMessageDialog(null, tempStringBuffer );
The append and insert Methods
• We use the append method to append a
String or StringBuffer object to the end of a
StringBuffer object.
– The method can also take an argument of the
primitive data type.
– Any primitive data type argument is converted
to a string before it is appended to a
StringBuffer object.
• We can insert a string at a specified
position by using the insert method.
The StringBuilder Class
• This class is new to Java 5.0 (SDK 1.5)
• The class is added to the newest version
of Java to improve the performance of the
StringBuffer class.
• StringBuffer and StringBuilder support
exactly the same set of methods, so they
are interchangeable.
The StringBuilder Class
• There are advanced cases where we must
use StringBuffer, but all sample
applications in the book, StringBuilder can
be used.
• Since the performance is not our main
concern and that the StringBuffer class is
usable for all versions of Java, we will use
StringBuffer only in this book.