Using Regular Expressions in EL

You can use regular expressions (regexp) within MEP using the standard Java regexp syntax. Regular expressions are incredibly powerful and complex, and describing the full functionality and syntax of regexp is beyond the scope of this website. Instead, this section is intended to give a brief overview of common syntax and explain how to use regexp within an expression.

There are two functions that use regexp. These are:

Regular expressions must use the same formatting as text strings in EL. As with other text strings in an expression, you must escape the backslash character using a second backslash, and enclose the regexp in double or single quotes.

We recommend that you use one of the online tools available for testing expressions against expected strings to help you build regular expressions. Keep in mind that, once you have created a working expression using an online tool, you will need to escape each \ in the regexp before using it in these EL functions.

Using regexpMatches to check answers sent by end users

The following expression uses the regexpMatches(string, regexp) function to check whether an end user has sent in the correct answer to a question ("colour"), taking into account that the user might have sent a common alternative spelling, or used upper case rather than lower case letters:

${af:regexpMatches(request.defaultMessage,"(?i).*\\bcolou?r\\b.*")}

The elements of the regexp are:

Syntax

Description

(?i)

This modifier makes the regexp following it is case insensitive.

.*

This syntax matches any character any number of times (including zero times).

\b

Matches a word boundary, such as a space character, punctuation, or line beginning. This is put either side of "colou?r" to make sure that the regexp doesn’t match another word, such as "colourful".

colou?r

This syntax looks for the characters. The ? indicates that the previous character (u) is optional. The regexp will match "color" or "colour", but not misspellings like "clour".

In this next example, the regexp looks for variants of "yes" (y, yes, yeah, yeh, yep). The different variants are grouped together using parenthesis. Note that the syntax \b is used to make sure that the regexp will only match "Y" and "y" when the character is alone, and not a part of another word:

${af:regexpMatches(request.defaultMessage,
"(?i).*\\b(Y|yes|yea?h|yep)\\b.*")}

Using capture groups to find a code or coupon

The following expression uses the regexpGetCaptureGroup(string, regexp, integer) function to find a code starting with two letters followed by 4 digits (for example, AD4501):

${af:regexpGetCaptureGroup(request.defaultMessage,"(?s)(?i).*([A-Z]{2}\\d{4}).*", 1)}

The elements of the regexp are:

Syntax

Description

(?s)

This modifier makes sure that the regexp ignores any line breaks.

(?i)

This modifier makes sure that the regexp following it is case insensitive.

.*

This syntax matches any character any amount of times, or no character. This enables the expression to ignore any text that occurs before a matching code.

([A......4})

The last set of parentheses is a capture group. Parenthesis contain either capture groups or modifiers.

[A-Z]

This matches any uppercase letter between A-Z. Because the modifier (?i) was previously specified, the regexp will also match any lowercase letter. You can use this format to also specify subsets of characters; for example, [aeiou] will match one any of the letters a, e, i, o or u; or you could use [a-ce-g] to match any letter from a to c or any letter from e to g.

{n\}

 

This specifies to look for the previous element twice {2} or four times {4}. Note that this does not mean that it is looking for multiple same characters or digits in the case of [A-Z] or \d.

\d

This matches any digit. This is used with {4} to look for a sequence of any four digits.

1

This identifies the capture group that you want the output from. In this example, there is only one capture group: ([A-Z]{2}\\d{4}).

Using capture groups to separate out three data parts

In this example, the regexpGetCaptureGroup(string, regexp, integer) function is used to capture up to three different chunks of data. In this example, the end user has been asked to provide their location, date and time of birth (in that order) for a zodiac service. The time is optional as not all people know this detail. The expression used is:

${af:regexpGetCaptureGroup(request.defaultMessage,"(?s)(\\S+)\\s+(\\S+)\\s*(\\S+)?", 1)}

To grab all the data elements, you would need to run the expression three times, changing the number to the far right that specifies the capture group you want the output from.

The elements of the regexp are:

Syntax

Description

(?s)

This modifier makes sure that the regexp ignores any line breaks.

(\S+)

This capture group matches any non-space characters; that is, it will match anything that is not a space. The \S matches to non-space characters, while the "+" character means that the string must be one character or more in length.

\s+

This matches to one or more space characters between the previous capture group and the next capture group. The \s matches to space characters, while the "+" character means that the string must be one character or more in length. Space characters include tabs and line breaks.

\s*

This matches to any number of space characters between the previous capture group and the next capture group, including zero space characters. In this example, the syntax \s* makes the space optional in case there is no third capture group.

?

This matches the previous element once or not at all. This makes the last capture group (\S+) optional in the regexp.

1

This identifies the capture group that you want the output from. The capture groups read left to right, with 1 identifying the first (leftmost) capture group in the regexp. To reference the last, optional capture group, use the number 3.

Other commonly used regexp syntax

As well as the above syntax, the following is some other commonly used regexp syntax that you may want to use in your expressions.

Syntax

Description

\D

Matches any character that is not a digit.

\B

Matches anything that is not a word boundary. A word boundary is defined as a space character, punctuation, or line beginning.

\w

Matches a word character such as a letter, digit, or underscore.

\W

Matches any character that is not a word character.

.

Matches any character of any type.