Capturing Text in a Group in a Regular Expression
A group is a pair of parentheses used to group subpatterns. For
example, h(a|i)t matches hat or hit. A group
also captures the matching text within the parentheses. For example,
causes the substring bb to be captured by the group (b*).
A pattern can have more than one group and the groups can be nested.
For example,
contains three groups:
The groups are numbered from left to right, outside to inside. There
is an implicit group 0, which contains the entire match. Here is an
example of what is captured in groups.
Notice that group 1 was applied twice, once to the input abb and
then to the input ab. Only the most recent match is captured.
Note that when using * on a group and the group matches zero
times, the group will not be cleared. In particular, it will hold the
most recently captured text. For example,
Group 1 first matched ab capturing b in group 2. Group 1 then
matched the a with group 2 matching zero bs, therefore leaving
intact the previously captured b.
Note: If it is not necessary for a group to capture text, you
should use a non-capturing group since it is more efficient. For more
information, see Using a Non-Capturing Group in a Regular Expression.
This example demonstrates how to retrieve the text in a group.
input: abbc
pattern: a(b*)c
pattern: (a(b*))+(c*)
group 1: (a(b*))
group 2: (b*)
group 3: (c*)
input: aba
pattern: (a(b)*)+
group 0: aba
group 1: a
group 2: b
CharSequence inputStr = "abbabcd";
String patternStr = "(a(b*))+(c*)";
// Compile and use regular expression
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
if (matchFound) {
// Get all groups for this match
for (int i=0; i<=matcher.groupCount(); i++) {
String groupStr = matcher.group(i);
}
}
Post a comment