Java Regular Expressions

The Java Regex or Regular Expression is an API that defines a pattern for searching or manipulating strings. A regular expression is a pattern that can be as simple as a single character or can be a pattern of characters to make a complex pattern. To work with regular expression, one has to import the package java.util.regex. The package provides the following classes and interfaces.

Matcher Class: The class implements the MatchResult interface. It is used for pattern searching

Pattern Class: It defines the pattern that has to be searched.

PatternSyntaxException Class: The class checks for the syntactical error in the regular expression pattern.

Matcher Class


The following table enlists the pre-defined methods of the Matcher class:

Method NameDescription
int start()Returns the first sequence of the matching sequence.
int groupCount()Returns the total count of sequences that are matched.
String group()Returns the sequence that is matched.
public boolean find()Searches for the next sequence that matches with the given pattern. If the sequence is found, returns false, else returns true. 
public boolean find(int st)Searches for the sequence that matches with the given pattern from the index st. If any match is found, returns false, else returns true. 
public boolean matches()The method tries to match the regular expression pattern with the input sequence. If any mismatch is found, returns false, else returns true. 

Pattern Class


The following table enlists the pre-defined methods of the Pattern class:

Method NameDescription
public Matcher matcher(CharSequence cs)Creates a sequence in which the defined pattern has to be found.
public static boolean matches(String re, CharSequence cs)A static method that searches the regular expression re in the sequence cs.
public String pattern()Returns the sequence that is matched.
public String[] split(CharSequence cs, int limit)An array of string is returned by this method by breaking the input on the basis of matches with the given pattern. The second parameter limit determines the number of times the split() method is called.
public static Pattern compile(String rgx)The method compiles the string rgx to generate a pattern. The pattern is then returned.

Let’s understand the concept of regular expression through a Java program.

Java Program

Consider the following program that shows how to use regular expression.

FileName: RegexExample.java

Output:


Explanation: The second parameter (Pattern.CASE_INSENSITIVE) in the compile() method is a flag that indicates that while making the pattern searching, the case sensitivity should not be taken into consideration. By default, the compile() method assumes that case sensitivity is present. The second parameter is optional and can be omitted. The matcher() method returns the object of the Matcher class. On this returned object, the find() method is invoked to check whether the regular expression pattern is available in the sequence or not.

Metacharacters

The characters that have special meaning are known as metacharacters. The following table shows the commonly used metacharacters in a regular expression.

MetacharactersDefinition
|Checks for any one of the patterns separated by |. For example, fish|dog|cat
\dLooks for a digit
\sLooks for a whitespace character
\uxxxxLooks for a Unicode character with the help of hexadecimal number xxxx
^Looks for a match in the starting of the string, e.g., ^World
$Looks for a match in the ending of the string, e.g., World$
.Looks for any single instance of character
\bLooks for a match either at the starting or at the ending of the string, e.g., \bWorld or World\b
\wLooks for any word character

Let’s use the metacharacters in a Java program.

FileName: RegexMetacharactersExample.java

Output:

Quantifiers

Quantifiers determine the number of characters or groups that should be present in the input to get a match.

QuantifiersDescription
Y*Looks for a string that contains 0 or greater than 0 occurrences of Y.
Y+Looks for a string that contains at least one occurrence of Y
Y{n, }Looks for a string that contains at least n occurrences of Y
Y{n}Looks for a string that contains exactly n occurrences of Y
Y(n1, n2}Looks for a string that contains at least n1 occurrences of Y but does not contain greater than n2 occurrences of Y
Y?Looks for 0 or 1 occurrences of Y

Java Program


The following program uses the quantifiers defined above.

FileName: QuantifiersExample.java

Output:

Explanation: The + quantifier takes all the matching characters at a time. Thus, following the greedy approach. Hence, all the indices of ‘t’ is taken from 0 to 3. Then, ‘s’ comes, which is not the part of the regular expression. After that a single ‘t’ occurs at index 5, which is represented in the output too.

The ? quantifier takes one character at a time. Therefore, in the output, we see every index from 0 to 5. The 6th index is shown because the ? quantifier also considers zero characters. After the 5th index, the input string finishes. Hence, the zero-character condition becomes true, and the 6th index is displayed in the output.

For the * quantifier also, the 6th index is shown because the * quantifiers also consider the zero-character condition. However, the * quantifier processes all the matching characters at a time. Therefore, we only see indices 0 and 6 in the output.

For the {n, } quantifier, the processing happens for greater than or equal to n matching characters at a time. Therefore, index 4 is seen in the output.

For the {n, m} quantifier (m should be greater than or equal to n), the processing happens for any matching characters whose frequency of occurrences lies between n and m at a time. If the frequency of occurrences happens to be more than m, then the frequency till m is considered, and in the next iteration, the remaining occurrences are considered. The same is evident by looking at the output.

For the {n} quantifier also, only the frequency of occurrences till n is considered. In the next iteration, the rest of the frequency of occurrences is considered. As n = 3 in our case, we see a gap of 3 in the output.

Pin It on Pinterest

Share This