XSLT Reference
normalize-unicode()
Applies Unicode normalization (NFC, NFD, NFKC, NFKD, or FULLY-NORMALIZED) to a string, ensuring a canonical character representation.
normalize-unicode(string, normalization-form?)Description
normalize-unicode() converts a string to a specified Unicode normalization form. Different normalization forms control whether composed or decomposed character representations are used, and whether compatibility equivalents are collapsed.
The most common use case is ensuring consistent string comparison when data may come from different systems that represent the same character differently — for example, the letter é can be stored as a single precomposed codepoint (U+00E9) or as e followed by a combining accent (U+0065 U+0301).
Normalization forms:
| Form | Name | Description |
|---|---|---|
NFC | Canonical Decomposition + Canonical Composition | Precomposed form (default, most common) |
NFD | Canonical Decomposition | Fully decomposed; base characters followed by combining marks |
NFKC | Compatibility Decomposition + Canonical Composition | Collapses compatibility variants (e.g., ligatures, width variants) |
NFKD | Compatibility Decomposition | Decomposed compatibility form |
FULLY-NORMALIZED | W3C XML extension | NFC with additional normalization of initial combining marks |
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
string | xs:string? | Yes | The string to normalize. |
normalization-form | xs:string | No | One of NFC, NFD, NFKC, NFKD, FULLY-NORMALIZED. Defaults to NFC. |
Return value
xs:string — the input string in the requested normalization form. Returns "" if string is an empty sequence.
Examples
Normalizing to NFC for consistent comparison
Stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/strings">
<normalized>
<xsl:for-each select="s">
<!-- Ensure NFC before comparison or storage -->
<s><xsl:value-of select="normalize-unicode(., 'NFC')"/></s>
</xsl:for-each>
</normalized>
</xsl:template>
</xsl:stylesheet>
Collapsing compatibility variants with NFKC
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<!-- Contains fi ligature (U+FB01) and ² superscript (U+00B2) -->
<value>file²</value>
</data>
Stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/data">
<result>
<!-- NFKC: fi → fi, ² → 2 -->
<xsl:value-of select="normalize-unicode(value, 'NFKC')"/>
</result>
</xsl:template>
</xsl:stylesheet>
Output:
<result>file2</result>
Notes
- NFC is the recommended normalization for most XML and web applications; it is the form used in HTML5 and most web APIs.
- NFKC is useful for search and indexing where compatibility equivalents should be treated identically (e.g., full-width vs. half-width letters, ligatures).
- NFD is mainly useful for low-level text processing or font rendering.
- The normalization form argument is case-insensitive;
"nfc"and"NFC"are equivalent. - If the argument is
""(empty string), the NFC form (default) is applied.