Quantcast
Channel: SCN : All Content - Web Dynpro Java
Viewing all articles
Browse latest Browse all 1288

Java Regular Expression Tutorial.

$
0
0

What Are Regular Expressions?

A regular expressiondefines a pattern for a String. Regular Expressions can be used to search, edit or manipulate text. Regular expressions are not language specific but they differ slightly for each language. Java regular expressions are most similar to Perl.

Java Regular Expression classes are present in java.util.regexpackage that contains three classes: Pattern,Matcherand PatternSyntaxException.

  1. 1. Patternobject is the compiled version of the regular expression. It doesn’t have any public constructor and we use it’s public static method compileto create the pattern object by passing regular expression argument.
  2. 2. Matcheris the regex engine object that matches the input String pattern with the pattern object created. This class doesn’t have any public construtor and we get a Matcher object using pattern object matchermethod that takes the input String as argument. We then use matchesmethod that returns boolean result based on input String matches the regex pattern or not.
  3. 3. PatternSyntaxExceptionis thrown if the regular expression syntax is not correct.

Regular Expressions common matching symbols

Regular Expression

Description

Example

.

Matches any single character

(“..”, “a%”) – true(“..”, “.a”) – true

(“..”, “a”) – false

^xxx

Matches xxx regex at the beginning of the line

  (“^a.c.”, “abcd”) – true

(“^a”, “ac”) – false

xxx$

Matches regex xxx at the end of the line

(“..cd$”, “abcd”) – true(“a$”, “a”) – true

(“a$”, “aca”) – false

[abc]

Can match any of the letter a, b or c. [] are known as character classes.

(“^[abc]d.”, “ad9″) – true(“[ab].d$”, “bad”) – true

(“[ab]x”, “cx”) – false

[abc][12]

Can match a, b or c followed by 1 or 2

(“[ab][12].”, “a2#”) – true(“[ab]..[12]“, “acd2″) – true

(“[ab][12]“, “c2″) – false

[^abc]

When ^ is the first character in [], it negates the pattern, matches anything except a, b or c

(“[^ab][^12].”, “c3#”) – true(“[^ab]..[^12]“, “xcd3″) – true

(“[^ab][^12]“, “c2″) – false

[a-e1-8]

Matches ranges between a to e or 1 to 8

(“[a-e1-3].”, “d#”) – true(“[a-e1-3]“, “2″) – true

(“[a-e1-3]“, “f2″) – false

xx|yy

Matches regex xx or yy

(“x.|y”, “xa”) – true(“x.|y”, “y”) – true (“x.|y”, “yz”) – false

Java Regular Expressions Metacharacters

We have some metacharacters also in regular expression, it’s like shortcodes for common matching patterns.

  Regular Expression

Description

\d

Any digits, short of [0-9]

\D

Any non-digit, short for [^0-9]

\s

Any whitespace character, short for [\t\n\x0B\f\r]

\S

Any non-whitespace character, short for [^\s]

\w

Any word character, short for [a-zA-Z_0-9]

\W

Any non-word character, short for [^\w]

\b

A word boundary

\B

A non word boundary

There are two ways to use metacharacters as ordinary characters in regular expressions.

  1. 1. Precede the metacharacter with a backslash (\).
  2. 2. Keep metacharcter within \Q (which starts the quote) and \E (which ends it).

Regular Expression Quantifiers

Quantifiers specify the number of occurrence of a character to match against.

Regular Expression

Description

x?

x occurs once or not at all

X*

X occurs zero or more times

X+

X occurs one or more times

X{n}

X occurs exactly n times

X{n,}

X occurs n or more times

X{n,m}

X occurs at least n times but not more than m times

Quantifiers can be used with character classes and capturing groups also.

For example, [abc]+ means a, b or c one or more times.

(abc)+ means the group “abc” one more more times. We will discuss about Capturing Groupnow.

Regular Expression Capturing Groups

Capturing groups are used to treat multiple characters as a single unit.  You can create a group using (). The portion of input String that matches the capturing group is saved into memory and can be recalled usingBackreference.

You can use matcher.groupCountmethod to find out the number of capturing groups in a regex pattern. For example in ((a)(bc)) contains 3 capturing groups; ((a)(bc)), (a) and (bc) .

You can use Backreferencein regular expression with backslash (\) and then the number of group to be recalled.

Capturing groups and Backreferences can be confusing, so let’s understand this with an example.

                 System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
                 System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
                 System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
                 System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false

In the first example, at runtimefirst capturing group is (\w\d) which evaluates to “a2″ when matched with the input String “a2a2″ and saved in memory. So \1 is referring to “a2″ and hence it returns true. Due to same reason second statement prints false.
Try to understand this scenario for statement 3 and 4 yourself.

Now we will look at some important methods of Pattern and Matcher classes.

We can create a Pattern object with flags. For example Pattern.CASE_INSENSITIVEenables case insensitive matching.

Pattern class also provides split(String)that is similar to String class split()method.
Pattern class
toString()method returns the regular expression String from which this pattern was compiled.

Matcher classes have start()and end()index methods that show precisely where the match was found in the input string.

Matcher class also provides String manipulation methods replaceAll(String replacement)and replaceFirst(String replacement).

Now we will see these common functions in action through a simple java class:

package com.journaldev.util;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExamples {
         public static void main(String[] args) {
                 // using pattern with flags
                 Pattern pattern = Pattern.compile("ab", Pattern.CASE_INSENSITIVE);
                 Matcher matcher = pattern.matcher("ABcabdAb");
                 // using Matcher find(), group(), start() and end() methods
                 while (matcher.find()) {
                          System.out.println("Found the text \"" + matcher.group()
                                            + "\" starting at " + matcher.start()
                                            + " index and ending at index " + matcher.end());
                 }
                 // using Pattern split() method
                 pattern = Pattern.compile("\\W");
                 String[] words = pattern.split("one@two#three:four$five");
                 for (String s : words) {
                          System.out.println("Split using Pattern.split(): " + s);
                 }
                 // using Matcher.replaceFirst() and replaceAll() methods
                 pattern = Pattern.compile("1*2");
                 matcher = pattern.matcher("11234512678");
                 System.out.println("Using replaceAll: " + matcher.replaceAll("_"));
                 System.out.println("Using replaceFirst: " + matcher.replaceFirst("_"));
         }
}

Output of the above program is:

Found the text "AB" starting at 0 index and ending at index 2
Found the text "ab" starting at 3 index and ending at index 5
Found the text "Ab" starting at 6 index and ending at index 8
Split using Pattern.split(): one
Split using Pattern.split(): two
Split using Pattern.split(): three
Split using Pattern.split(): four
Split using Pattern.split(): five
Using replaceAll: _345_678
Using replaceFirst: _34512678

Viewing all articles
Browse latest Browse all 1288

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>