Java Regular Expressions

The Java Regex or Regular Expression is an API that defines a pattern for searching or manipulating strings. A regular expression is a pattern that can be as simple as a single character or can be a pattern of characters to make a complex pattern. To work with regular expression, one has to import the package java.util.regex. The package provides the following classes and interfaces.

Matcher Class: The class implements the MatchResult interface. It is used for pattern searching

Pattern Class: It defines the pattern that has to be searched.

PatternSyntaxException Class: The class checks for the syntactical error in the regular expression pattern.

Matcher Class

The following table enlists the pre-defined methods of the Matcher class:

Method Name	Description
int start()	Returns the first sequence of the matching sequence.
int groupCount()	Returns the total count of sequences that are matched.
String group()	Returns the sequence that is matched.
public boolean find()	Searches for the next sequence that matches with the given pattern. If the sequence is found, returns false, else returns true.
public boolean find(int st)	Searches for the sequence that matches with the given pattern from the index st. If any match is found, returns false, else returns true.
public boolean matches()	The method tries to match the regular expression pattern with the input sequence. If any mismatch is found, returns false, else returns true.

Pattern Class

The following table enlists the pre-defined methods of the Pattern class:

Method Name	Description
public Matcher matcher(CharSequence cs)	Creates a sequence in which the defined pattern has to be found.
public static boolean matches(String re, CharSequence cs)	A static method that searches the regular expression re in the sequence cs.
public String pattern()	Returns the sequence that is matched.
public String[] split(CharSequence cs, int limit)	An array of string is returned by this method by breaking the input on the basis of matches with the given pattern. The second parameter limit determines the number of times the split() method is called.
public static Pattern compile(String rgx)	The method compiles the string rgx to generate a pattern. The pattern is then returned.

Let’s understand the concept of regular expression through a Java program.

Java Program

Consider the following program that shows how to use regular expression.

FileName: RegexExample.java

 // importing the class Matcher
import java.util.regex.Matcher;
// importing the class Pattern
import java.util.regex.Pattern;
public class RegexExample
{
// main method       
public static void main(String argvs[])
{
// the pattern is Tutorial & example 
Pattern pt = Pattern.compile("Tutorial & example", Pattern.CASE_INSENSITIVE);
// the input sequence in which the pattern is searched.      
Matcher matcherObj = pt.matcher("Visit tutorial & example for learning about Java!");
// invoking the find() method       
boolean isMatchFound = matcherObj.find();
// checking whether match is found or not      
if(isMatchFound)
{
  System.out.println("Match found for the given pattern.");
}
else
{
 System.out.println("Match is not found for the given pattern.");
}
}
}

Output:

Match found for the given pattern.

Explanation: The second parameter (Pattern.CASE_INSENSITIVE) in the compile() method is a flag that indicates that while making the pattern searching, the case sensitivity should not be taken into consideration. By default, the compile() method assumes that case sensitivity is present. The second parameter is optional and can be omitted. The matcher() method returns the object of the Matcher class. On this returned object, the find() method is invoked to check whether the regular expression pattern is available in the sequence or not.

Metacharacters

The characters that have special meaning are known as metacharacters. The following table shows the commonly used metacharacters in a regular expression.

Metacharacters	Definition
\|	Checks for any one of the patterns separated by \|. For example, fish\|dog\|cat
\d	Looks for a digit
\s	Looks for a whitespace character
\uxxxx	Looks for a Unicode character with the help of hexadecimal number xxxx
^	Looks for a match in the starting of the string, e.g., ^World
$	Looks for a match in the ending of the string, e.g., World$
.	Looks for any single instance of character
\b	Looks for a match either at the starting or at the ending of the string, e.g., \bWorld or World\b
\w	Looks for any word character

Let’s use the metacharacters in a Java program.

FileName: RegexMetacharactersExample.java

 // import statements
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMetacharactersExample
{
// main method
public static void main( String argvs[])
{
// regular expression using \b
String rx = "own\\b";
String input = "crown own town brown owner grown flown blown";
Pattern ptrn = Pattern.compile(rx);
Matcher matcher = ptrn.matcher(input);
int cnt = 0;
while(matcher.find())
{
 cnt  = cnt + 1;
}
System.out.println("Number of matches for the word \"own\" is : " + cnt);
// regular expression using |
rx = "world | hello";
input = "world is hello is world is hello is";
ptrn = Pattern.compile(rx);
matcher = ptrn.matcher(input); 
cnt = 0; // resetting the value of the cnt is 0
// checks for either world or hello
while(matcher.find())
{
 cnt  = cnt + 1;
}
System.out.println("Number of matches for the word \"world\" and \"hello\" is : " + cnt);
// regular expression using ^ 
rx = "^hello";
input = "hello hello is world is hello is";
ptrn = Pattern.compile(rx);
matcher = ptrn.matcher(input);
cnt = 0; // resetting the value of the cnt is 0
// checks whether the input string starts with the word hello or not
while(matcher.find())
{
 cnt  = cnt + 1;
}
System.out.println("Number of matches for the word \"hello\" is : " + cnt);
// regular expression using $
rx = "hello$";
input = "hello is world is hello";
ptrn = Pattern.compile(rx);
matcher = ptrn.matcher(input);
cnt = 0; // resetting the value of the cnt is 0
// checks whether the input string ends with the word hello or not
while(matcher.find())
{
 cnt  = cnt + 1;
}
System.out.println("Number of matches for the word \"hello\" is : " + cnt);
// regular expression using .
rx = ".";
input = "hello world";
ptrn = Pattern.compile(rx);
matcher = ptrn.matcher(input);
cnt = 0; // resetting the value of the cnt is 0
// checks whether the input string ends with the word hello or not
while(matcher.find())
{
 cnt  = cnt + 1;
}
System.out.println("Total number of characters are : " + cnt);
// regular expression using \d
rx = "hello\\d";
input = "hello hello hello9";
ptrn = Pattern.compile(rx);
matcher = ptrn.matcher(input);
cnt = 0; // resetting the value of the cnt is 0
// checks whether the string contains hello[0-9]
while(matcher.find())
{
 cnt  = cnt + 1;
}
System.out.println("Number of matches for hello[0-9] : " + cnt);
// regular expression using \s
rx = "hello\\s";
input = "hello hello hello9";
ptrn = Pattern.compile(rx);
matcher = ptrn.matcher(input);
cnt = 0; // resetting the value of the cnt is 0
// checks whether the string contains hello with a whitespace
while(matcher.find())
{
 cnt  = cnt + 1;
}
System.out.println("Number of matches for hello with whitespace: " + cnt); 
}
}

Output:

 Number of matches for the word "own" is : 7
Number of matches for the word "world" and "hello" is : 4
Number of matches for the word "hello" is : 1
Number of matches for the word "hello" is : 1
Total number of characters are : 11
Number of matches for hello[0-9] : 1
Number of matches for hello with whitespace: 2

Quantifiers

Quantifiers determine the number of characters or groups that should be present in the input to get a match.

Quantifiers	Description
Y*	Looks for a string that contains 0 or greater than 0 occurrences of Y.
Y+	Looks for a string that contains at least one occurrence of Y
Y{n, }	Looks for a string that contains at least n occurrences of Y
Y{n}	Looks for a string that contains exactly n occurrences of Y
Y(n1, n2}	Looks for a string that contains at least n1 occurrences of Y but does not contain greater than n2 occurrences of Y
Y?	Looks for 0 or 1 occurrences of Y

Java Program

The following program uses the quantifiers defined above.

FileName: QuantifiersExample.java

 // import statements
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class QuantifiersExample
{
// main method
public static void main(String argvs[])
{
System.out.println("For + quantifiers \n");
// regular expression for at least one 't'.
String rx = "t+";
Pattern ptrn = Pattern.compile(rx);
// Creating an object of the Matcher class
Matcher mtchr = ptrn.matcher("ttttst");
while (mtchr.find())
{
System.out.println("Pattern found from " + mtchr.start() + " to " + (mtchr.end() - 1));
}
System.out.println("\n");
// s t or k can appear zero or one time
rx = "[stk]?";
System.out.println("For ? quantifier \n");
ptrn = Pattern.compile(rx);
mtchr = ptrn.matcher("ssttkk");
while (mtchr.find())
{
System.out.println("Pattern found at index " + mtchr.start());
}
// s t or k can appear zero or more times
rx = "[stk]*";
System.out.println();
System.out.println("For * quantifier \n");
ptrn = Pattern.compile(rx);
mtchr = ptrn.matcher("ssttkk");
while (mtchr.find())
{
System.out.println("Pattern found at index " + mtchr.start());
}
// k has to appear at least 3 times
rx = "k{3,}";
System.out.println();
System.out.println("For {n, } quantifier \n");
ptrn = Pattern.compile(rx);
mtchr = ptrn.matcher("ssttkkkk");
while (mtchr.find())
{
System.out.println("Pattern found at index " + mtchr.start());
}
// k has to appear at least 3 times but not greater than 6 times
rx = "k{3,6}";
System.out.println();
System.out.println("For {n, m} quantifier \n");
ptrn = Pattern.compile(rx);
mtchr = ptrn.matcher("ssttkkkkkkkkkkkkkk");
while (mtchr.find())
{
System.out.println("Pattern found from " + mtchr.start() + " to " + (mtchr.end() - 1));
}
// k has to appear exactly 3 times
rx = "k{3}";
System.out.println();
System.out.println("For {n} quantifier \n");
ptrn = Pattern.compile(rx);
mtchr = ptrn.matcher("ssttkkkkkkkkkkkkkk");
while (mtchr.find())
{
System.out.println("Pattern found from " + mtchr.start() + " to " + (mtchr.end() - 1));
}
}
}

Output:

 For + quantifiers
Pattern found from 0 to 3
Pattern found from 5 to 5
For ? quantifier
Pattern found at index 0
Pattern found at index 1
Pattern found at index 2
Pattern found at index 3
Pattern found at index 4
Pattern found at index 5
Pattern found at index 6
For * quantifier
Pattern found at index 0
Pattern found at index 6
For {n, } quantifier
Pattern found at index 4
For {n, m} quantifier
Pattern found from 4 to 9
Pattern found from 10 to 15
For {n} quantifier
Pattern found from 4 to 6
Pattern found from 7 to 9
Pattern found from 10 to 12
Pattern found from 13 to 15

Explanation: The + quantifier takes all the matching characters at a time. Thus, following the greedy approach. Hence, all the indices of ‘t’ is taken from 0 to 3. Then, ‘s’ comes, which is not the part of the regular expression. After that a single ‘t’ occurs at index 5, which is represented in the output too.

The ? quantifier takes one character at a time. Therefore, in the output, we see every index from 0 to 5. The 6^th index is shown because the ? quantifier also considers zero characters. After the 5^th index, the input string finishes. Hence, the zero-character condition becomes true, and the 6^th index is displayed in the output.

For the * quantifier also, the 6^th index is shown because the * quantifiers also consider the zero-character condition. However, the * quantifier processes all the matching characters at a time. Therefore, we only see indices 0 and 6 in the output.

For the {n, } quantifier, the processing happens for greater than or equal to n matching characters at a time. Therefore, index 4 is seen in the output.

For the {n, m} quantifier (m should be greater than or equal to n), the processing happens for any matching characters whose frequency of occurrences lies between n and m at a time. If the frequency of occurrences happens to be more than m, then the frequency till m is considered, and in the next iteration, the remaining occurrences are considered. The same is evident by looking at the output.

For the {n} quantifier also, only the frequency of occurrences till n is considered. In the next iteration, the rest of the frequency of occurrences is considered. As n = 3 in our case, we see a gap of 3 in the output.

← Prev Next →

Java Tutorial Index

Java Loops

Java Programs

Java Sorting

Java OOPs Concepts

Java Strings

Java Exceptions

Garbage Collection

Multithreading

Java IO

Serialization

Networking

AWT

Swing

Java Collections

Java Generics

Java Annotations

Java JDBC

Java Differences

How to

Java 8 Features

Java 9 Features

Java 12

Java 13

Java 14

Java 15

Java 16

Java 17

Java Math Methods

Java String Methods

Java Conversion

Java Keywords

Java Problems

Java Questions

Java Interview Questions

Misc

Java Regular Expressions