XSLT Reference
tokenize()
Splits a string into a sequence of substrings using a regular expression as the delimiter pattern.
tokenize(string, pattern, flags?)Description
tokenize() splits a string into a sequence of substrings wherever the pattern (a regular expression) matches. The matched delimiters are not included in the result. The function returns a sequence of xs:string items.
If the input string begins or ends with the delimiter pattern, the result includes an empty string at the start or end of the sequence respectively. If the input is an empty string and the pattern matches the empty string, the result is an empty sequence.
tokenize() is the inverse of string-join(): where string-join() assembles strings from a sequence, tokenize() disassembles a string into a sequence. This makes them natural complements for round-tripping delimited data.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
string | xs:string | Yes | The input string to split. |
pattern | xs:string | Yes | A regular expression matching the delimiter. |
flags | xs:string | No | Regex flags: i (case-insensitive), m (multiline), s (dot-all), x (extended). |
Return value
xs:string* — a sequence of substrings split at each match of pattern.
Examples
Split a comma-separated list
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<csv>apple,banana,cherry,date</csv>
</data>
Stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/data">
<items>
<xsl:for-each select="tokenize(csv, ',')">
<item><xsl:value-of select="."/></item>
</xsl:for-each>
</items>
</xsl:template>
</xsl:stylesheet>
Output:
<items>
<item>apple</item>
<item>banana</item>
<item>cherry</item>
<item>date</item>
</items>
Split on whitespace
Stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/data">
<words>
<!-- \s+ matches one or more whitespace characters -->
<xsl:for-each select="tokenize(normalize-space(csv), '\s+')">
<word><xsl:value-of select="."/></word>
</xsl:for-each>
</words>
</xsl:template>
</xsl:stylesheet>
Count tokens
<!-- Count the number of comma-separated values -->
<xsl:value-of select="count(tokenize(csv, ','))"/>
Access a specific token by position
<!-- Get the second CSV field -->
<xsl:value-of select="tokenize(csv, ',')[2]"/>
Notes
- The
patternargument is a regular expression, not a plain string. Characters like.,*,+,?,(,)must be escaped with\if used literally (e.g., to split on a literal., use'\.'). tokenize()returns zero or more strings; iterate the result withxsl:for-eachor index it with[n].- If
patterncan match an empty string (e.g.,'.*'), Saxon raises an error — the delimiter must have non-zero length. - For simple fixed-character splitting (comma, pipe),
tokenize()is the idiomatic choice in XSLT 2.0+. In XSLT 1.0, use recursive named templates.