Altova MapForce 2024 Enterprise Edition

Navigation: Functions > Function Library Reference > core | string functions

tokenize-regexp

Splits the input string into a sequence of strings. Any substring that matches the regular expression pattern supplied as argument defines the separator. The matched (separator) strings are not included in the result returned by the function.

Note:

When generating C++, C#, or Java code, the advanced features of the regular expression syntax might differ slightly. See the regex documentation of each language for more information.

Languages

Built-in, C++, C#, Java, XQuery, XSLT 2.0, XSLT 3.0.

Parameters

Name	Description
input	The input string.
pattern	Provides a regular expression pattern. Any substring that matches the pattern will be treated as delimiter. For more information, see Regular expressions.
flags	Optional parameter. Provides the regular expression flags to be used. For example, the flag "i" instructs the mapping process to operate in case-insensitive mode.

Example

The goal of the mapping illustrated below is to split the string a , b c,d into a sequence of strings, where each alphabetic character is an item in the sequence. Any redundant whitespace or commas must be removed.

To achieve this goal, the regular expression pattern [ ,]+ was supplied as parameter to the tokenize-regexp function. This pattern has the following meaning:

•It matches any of the characters inside the character class [ ,]. Therefore, a split will occur whenever a comma or a space is encountered in the input string.

•The quantifier + specifies that one or more occurrences of the preceding character class are to be matched. Without this quantifier, each occurrence of space or comma would create a separate item in the resulting sequence of strings, which is not the intended result.

The mapping output is as follows: