XSLT Reference

string-to-codepoints()

XSLT 2.0 string function

Returns a sequence of integers representing the Unicode codepoints of each character in a string.

Syntax

string-to-codepoints(string)

Description

string-to-codepoints() decomposes a string into its individual Unicode characters and returns their integer codepoints as a sequence of xs:integer values. The sequence length equals the number of Unicode characters (codepoints) in the string, which may differ from the byte length in UTF-8 or UTF-16 encodings.

It is the inverse of codepoints-to-string() and enables character-level manipulation — inspecting, filtering, or transforming individual characters by their numeric values.

If the argument is an empty sequence or an empty string, the function returns an empty sequence.

Parameters

Parameter	Type	Required	Description
`string`	xs:string?	Yes	The string to decompose into codepoints.

Return value

xs:integer* — a sequence of Unicode codepoint integers, one per character.

Examples

Inspecting character codepoints

Stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/">
    <codepoints>
      <xsl:for-each select="string-to-codepoints('Hello!')">
        <cp value="{.}"/>
      </xsl:for-each>
    </codepoints>
  </xsl:template>
</xsl:stylesheet>

Output:

<codepoints>
  <cp value="72"/>
  <cp value="101"/>
  <cp value="108"/>
  <cp value="108"/>
  <cp value="111"/>
  <cp value="33"/>
</codepoints>

Filtering non-ASCII characters

Input XML:

<?xml version="1.0" encoding="UTF-8"?>
<texts>
  <text>Héllo Wörld</text>
  <text>Plain ASCII only</text>
</texts>