Java offers the java.util.regex package for prototype matching with usual expressions. Java's usual expressions are extremely alike to the Perl programming language and incredibly trouble-free to learn.
A regular expression is an extraordinary series of characters that helps you bout or finds other strings or sets of strings, utilizing a specialized syntax detained in a pattern. They can be employed to hunt, edit, or influence text and data.
The java.util.regex package mainly comprises of the subsequent three classes −
Capturing groups are a technique to indulge multiple dispositions as a solitary unit. They are generated by placing the quality to be grouped surrounded by a set of parentheses. For instance, the regular expression (dog) creates a particular group having the letters "d", "o", and "g".
Capturing groups are numbered by counting their opening parentheses from the left to the right. In the expression ((A)(B(C))), for example, there are four such groups −
To discover out how numerous groups are here in the look, call the groupCount process on a matcher object. The groupCount method revisits an int viewing the number of capturing groups near in the matcher's outline.
There is also a particular group, group 0, which always symbolizes the complete expression. This group is not comprised in the total accounted by groupCount.
The following example illustrates how to find a digit string from the given alphanumeric string
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { public static void main( String args[] ) { // String to be scanned to find the pattern. String line = "This order was placed for QT3000! OK?"; String pattern = "(.*)(\\d+)(.*)"; // Create a Pattern object Pattern r = Pattern.compile(pattern); // Now create matcher object. Matcher m = r.matcher(line); if (m.find( )) { System.out.println("Found value: " + m.group(0) ); System.out.println("Found value: " + m.group(1) ); System.out.println("Found value: " + m.group(2) ); }else { System.out.println("NO MATCH"); } } }
This will produce the following result:
Found value: This order was placed for QT3000! OK? Found value: This order was placed for QT300 Found value: 0
Here is the table listing down all the regular expression metacharacter syntax available in Java −
Subexpression | Matches |
---|---|
^ | Matches the beginning of the line. |
$ | Matches the end of the line. |
. | Matches any single character except newline. Using the m option allows it to match the newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets. |
\A | Beginning of the entire string. |
\z | End of the entire string. |
\Z | End of the entire string except for allowable final line terminator. |
re* | Matches 0 or more occurrences of the preceding expression. |
re+ | Matches 1 or more of the previous thing. |
re? | Matches 0 or 1 occurrence of the preceding expression. |
re{ n} | Matches exactly n number of occurrences of the preceding expression. |
re{ n,} | Matches n or more occurrences of the preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of the preceding expression. |
a' b | Matches either a or b. |
(re) | Groups regular expressions and remembers the matched text. |
(?: re) | Groups regular expressions without remembering the matched text. |
(?> re) | Matches the independent pattern without backtracking. |
\w | Matches the word characters. |
\W | Matches the nonword characters. |
\s | Matches the whitespace. Equivalent to [\t\n\r\f]. |
\S | Matches the nonwhitespace. |
\d | Matches the digits. Equivalent to [0-9]. |
\D | Matches the nondigits. |
\A | Matches the beginning of the string. |
\Z | Matches the end of the string. If a newline exists, it matches just before the newline. |
\z | Matches the end of the string. |
\G | Matches the point where the last match finished. |
\n | Back-reference to capture group number "n". |
\b | Matches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets. |
\B | Matches the nonword boundaries. |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. |
\Q | Escape (quote) all characters up to \E. |
\E | Ends quoting begun with \Q. |
Here is a list of useful instance methods −
Index methods provide useful index values that show precisely where the match was found in the input string −
Sr.No. | Method & Description |
---|---|
1 | public int start() Returns the start index of the previous match. |
2 | public int start(int group) Returns the start index of the subsequence captured by the given group during the previous match operation. |
3 | public int end() Returns the offset after the last character is matched. |
4 | public int end(int group) Returns the offset after the last character of the subsequence captured by the given group during the previous match operation. |
Study methods review the input string and return a Boolean indicating whether or not the pattern is found −
Sr.No. | Method & Description |
---|---|
1 | public boolean lookingAt() Attempts to match the input sequence, starting at the beginning of the region, against the pattern. |
2 | public boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern. |
3 | public boolean find(int start) Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index. |
4 | public boolean matches() Attempts to match the entire region against the pattern. |
Replacement methods are useful methods for replacing text in an input string −
Sr.No. | Method & Description |
---|---|
1 | public Matcher appendReplacement(StringBuffer sb, String replacement) Implements a non-terminal append-and-replace step. |
2 | public StringBuffer appendTail(StringBuffer sb) Implements a terminal append-and-replace step. |
3 | public String replaceAll(String replacement) Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. |
4 | public String replaceFirst(String replacement) Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string. |
5 | public static String quoteReplacement(String s) Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. |
Following is the example that counts the number of times the word "cat" appears in the input string −
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static final String REGEX = "\\bcat\\b"; private static final String INPUT = "cat cat cat cattie cat"; public static void main( String args[] ) { Pattern p = Pattern.compile(REGEX); Matcher m = p.matcher(INPUT); // get a matcher object int count = 0; while(m.find()) { count++; System.out.println("Match number "+count); System.out.println("start(): "+m.start()); System.out.println("end(): "+m.end()); } } }
This will produce the following result
Match number 1 start(): 0 end(): 3 Match number 2 start(): 4 end(): 7 Match number 3 start(): 8 end(): 11 Match number 4 start(): 19 end(): 22
You could observe that thees instance utilizations word boundaries to make sure that the letters "c" "a" "t" is not just a substring in a longer word. It also provides some helpful information concerning wherein the input string the match has occurred.
The create method revisits the create index of the subsequence detained by the given group through the preceding match procedure, and the end arrivals the index of the previous character matched, plus one.
The matches and lookingAt Methods
The matches and searching at methods together attempt to compete for an input progression against an outline. The disparity, however, is that matches necessitate the entire input succession to be matched while looking at does not.
Both techniques always create at the commencing of the input filament. Here is the illustration explaining the process −
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static final String REGEX = "foo"; private static final String INPUT = "fooooooooooooooooo"; private static Pattern pattern; private static Matcher matcher; public static void main( String args[] ) { pattern = Pattern.compile(REGEX); matcher = pattern.matcher(INPUT); System.out.println("Current REGEX is: "+REGEX); System.out.println("Current INPUT is: "+INPUT); System.out.println("lookingAt(): "+matcher.lookingAt()); System.out.println("matches(): "+matcher.matches()); } }
This will produce the following result:
Current REGEX is: foo Current INPUT is: fooooooooooooooooo lookingAt(): true matches(): false
The replaceFirst and replaceAll methods replace the text that matches a given regular expression. As their names indicate, replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences.
Here is the example explaining the functionality −
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static String REGEX = "dog"; private static String INPUT = "The dog says meow. " + "All dogs say meow."; private static String REPLACE = "cat"; public static void main(String[] args) { Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); INPUT = m.replaceAll(REPLACE); System.out.println(INPUT); } }
This will produce the following result:
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static String REGEX = "a*b"; private static String INPUT = "aabfooaabfooabfoob"; private static String REPLACE = "-"; public static void main(String[] args) { Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); StringBuffer sb = new StringBuffer(); while(m.find()) { m.appendReplacement(sb, REPLACE); } m.appendTail(sb); System.out.println(sb.toString()); } }
This will produce the following result:
-foo-foo-foo-
A PatternSyntaxException has ensured the exception that designates a syntax error in a usual expression prototype. The PatternSyntaxException class offers the subsequent methods to assist you settle on what went wrong
Sr.No. | Method & Description |
---|---|
1 | public String getDescription() Retrieves the description of the error. |
2 | public int getIndex() Retrieves the error-index. |
3 | public String getPattern() Retrieves the erroneous regular expression pattern. |
4 | public String getMessage() Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error-index within the pattern. |
Here at Intellinuts, we have created a complete Java tutorial for Beginners to get started in Java.