tokenize-regexp
Splits the input string into a sequence of strings. Any substring that matches the regular expression pattern supplied as argument defines the separator. The matched (separator) strings are not included in the result returned by the function.
Note: | When generating C++, C#, or Java code, the advanced features of the regular expression syntax might differ slightly. See the regex documentation of each language for more information. |
Languages
Built-in, C++, C#, Java, XQuery, XSLT 2.0, XSLT 3.0.
Parameters
Name | Description |
---|---|
input | The input string. |
pattern | Provides a regular expression pattern. Any substring that matches the pattern will be treated as delimiter. For more information, see Regular expressions. |
flags | Optional parameter. Provides the regular expression flags to be used. For example, the flag "i" instructs the mapping process to operate in case-insensitive mode. |
Example
The goal of the mapping illustrated below is to split the string a , b c,d into a sequence of strings, where each alphabetic character is an item in the sequence. Any redundant whitespace or commas must be removed.
To achieve this goal, the regular expression pattern [ ,]+ was supplied as parameter to the tokenize-regexp function. This pattern has the following meaning:
•It matches any of the characters inside the character class [ ,]. Therefore, a split will occur whenever a comma or a space is encountered in the input string.
•The quantifier + specifies that one or more occurrences of the preceding character class are to be matched. Without this quantifier, each occurrence of space or comma would create a separate item in the resulting sequence of strings, which is not the intended result.
The mapping output is as follows:
<items> |