Sponsored

XSLT Reference

normalize-unicode()

XSLT 2.0 string function

Applies Unicode normalization (NFC, NFD, NFKC, NFKD, or FULLY-NORMALIZED) to a string, ensuring a canonical character representation.

Syntax
normalize-unicode(string, normalization-form?)

Description

normalize-unicode() converts a string to a specified Unicode normalization form. Different normalization forms control whether composed or decomposed character representations are used, and whether compatibility equivalents are collapsed.

The most common use case is ensuring consistent string comparison when data may come from different systems that represent the same character differently — for example, the letter é can be stored as a single precomposed codepoint (U+00E9) or as e followed by a combining accent (U+0065 U+0301).

Normalization forms:

FormNameDescription
NFCCanonical Decomposition + Canonical CompositionPrecomposed form (default, most common)
NFDCanonical DecompositionFully decomposed; base characters followed by combining marks
NFKCCompatibility Decomposition + Canonical CompositionCollapses compatibility variants (e.g., ligatures, width variants)
NFKDCompatibility DecompositionDecomposed compatibility form
FULLY-NORMALIZEDW3C XML extensionNFC with additional normalization of initial combining marks

Parameters

ParameterTypeRequiredDescription
stringxs:string?YesThe string to normalize.
normalization-formxs:stringNoOne of NFC, NFD, NFKC, NFKD, FULLY-NORMALIZED. Defaults to NFC.

Return value

xs:string — the input string in the requested normalization form. Returns "" if string is an empty sequence.

Examples

Normalizing to NFC for consistent comparison

Stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/strings">
    <normalized>
      <xsl:for-each select="s">
        <!-- Ensure NFC before comparison or storage -->
        <s><xsl:value-of select="normalize-unicode(., 'NFC')"/></s>
      </xsl:for-each>
    </normalized>
  </xsl:template>
</xsl:stylesheet>

Collapsing compatibility variants with NFKC

Input XML:

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <!-- Contains fi ligature (U+FB01) and ² superscript (U+00B2) -->
  <value>file²</value>
</data>

Stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/data">
    <result>
      <!-- NFKC: fi → fi, ² → 2 -->
      <xsl:value-of select="normalize-unicode(value, 'NFKC')"/>
    </result>
  </xsl:template>
</xsl:stylesheet>

Output:

<result>file2</result>

Notes

  • NFC is the recommended normalization for most XML and web applications; it is the form used in HTML5 and most web APIs.
  • NFKC is useful for search and indexing where compatibility equivalents should be treated identically (e.g., full-width vs. half-width letters, ligatures).
  • NFD is mainly useful for low-level text processing or font rendering.
  • The normalization form argument is case-insensitive; "nfc" and "NFC" are equivalent.
  • If the argument is "" (empty string), the NFC form (default) is applied.

See also