Normalizer::normalize

normalizer_normalize

(PHP 5 >= 5.3.0, PHP 7, PECL intl >= 1.0.0)

Normalizer::normalize -- normalizer_normalize — Normalizes the input provided and returns the normalized string

说明

面向对象风格

public static string Normalizer::normalize ( string $input [, int $form = Normalizer::FORM_C ] )

过程化风格

string normalizer_normalize ( string $input [, int $form = Normalizer::FORM_C ] )

Normalizes the input provided and returns the normalized string

参数

input: The input string to normalize
form: One of the normalization forms.

返回值

The normalized string or FALSE if an error occurred.

范例

Example #1 normalizer_normalize() example


<?php
$char_A_ring = "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A";  // 'COMBINING RING ABOVE' (U+030A)
 
$char_1 = normalizer_normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = normalizer_normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );
 
echo urlencode($char_1);
echo ' ';
echo urlencode($char_2);
?>

Example #2 OO example


<?php
$char_A_ring = "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A";  // 'COMBINING RING ABOVE' (U+030A)
 
$char_1 = Normalizer::normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = Normalizer::normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );
 
echo urlencode($char_1);
echo ' ';
echo urlencode($char_2);
?>

以上例程会输出：

%C3%85 %C3%85

参见

normalizer_is_normalized() - Checks if the provided string is already in the specified normalization form.

User Contributed Notes

spam at oscar dot xyz 25-Feb-2015 11:39


You can use the 'original' abbreviations if you feel more comfortable:



<?php

Normalizer::NFD;

Normalizer::NFKD;

Normalizer::NFC;

Normalizer::NFKC;

?>

tom dot vom dot berg at online dot de 12-Apr-2014 05:54


If you get error messages while starting apache of xampp package with activated extension=intl.dll, copy the files



    * icudt##.dll

    * icuin##.dll

    * icuio##.dll

    * icule##.dll

    * iculx##.dll

    * icutu##.dll

    * icuuc##.dll



## = version number



from "/program files/xampp/php"

into your "/program files/xampp/apache/bin" or whereever your xampp resides :-)

akniep at rayo dot info 30-Jul-2009 06:03


Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before.

The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.



Be sure to have the PHP-Normalizer-extension (intl and icu) installed.



Tipp: You may also want to map the text to lower case before execute matching procedures ...



<?php



function normalizeUtf8String( $s)

{

    // Normalizer-class missing!

    if (! class_exists("Normalizer", $autoload = false))

        return $original_string;

    

    

    // maps German (umlauts) and other European characters onto two characters before just removing diacritics

    $s    = preg_replace( '@\x{00c4}@u'    , "AE",    $s );    // umlaut ? => AE

    $s    = preg_replace( '@\x{00d6}@u'    , "OE",    $s );    // umlaut ? => OE

    $s    = preg_replace( '@\x{00dc}@u'    , "UE",    $s );    // umlaut ü => UE

    $s    = preg_replace( '@\x{00e4}@u'    , "ae",    $s );    // umlaut ? => ae

    $s    = preg_replace( '@\x{00f6}@u'    , "oe",    $s );    // umlaut ? => oe

    $s    = preg_replace( '@\x{00fc}@u'    , "ue",    $s );    // umlaut ü => ue

    $s    = preg_replace( '@\x{00f1}@u'    , "ny",    $s );    // ? => ny

    $s    = preg_replace( '@\x{00ff}@u'    , "yu",    $s );    // ? => yu

    

    

    // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark

        // exmaple:  ú => U′,  á => a`

    $s    = Normalizer::normalize( $s, Normalizer::FORM_D );

    

    

    $s    = preg_replace( '@\pM@u'        , "",    $s );    // removes diacritics

    

    

    $s    = preg_replace( '@\x{00df}@u'    , "ss",    $s );    // maps German ? onto ss

    $s    = preg_replace( '@\x{00c6}@u'    , "AE",    $s );    // ? => AE

    $s    = preg_replace( '@\x{00e6}@u'    , "ae",    $s );    // ? => ae

    $s    = preg_replace( '@\x{0132}@u'    , "IJ",    $s );    // ? => IJ

    $s    = preg_replace( '@\x{0133}@u'    , "ij",    $s );    // ? => ij

    $s    = preg_replace( '@\x{0152}@u'    , "OE",    $s );    // ? => OE

    $s    = preg_replace( '@\x{0153}@u'    , "oe",    $s );    // ? => oe

    

    $s    = preg_replace( '@\x{00d0}@u'    , "D",    $s );    // D => D

    $s    = preg_replace( '@\x{0110}@u'    , "D",    $s );    // D => D

    $s    = preg_replace( '@\x{00f0}@u'    , "d",    $s );    // e => d

    $s    = preg_replace( '@\x{0111}@u'    , "d",    $s );    // d => d

    $s    = preg_replace( '@\x{0126}@u'    , "H",    $s );    // H => H

    $s    = preg_replace( '@\x{0127}@u'    , "h",    $s );    // h => h

    $s    = preg_replace( '@\x{0131}@u'    , "i",    $s );    // i => i

    $s    = preg_replace( '@\x{0138}@u'    , "k",    $s );    // ? => k

    $s    = preg_replace( '@\x{013f}@u'    , "L",    $s );    // ? => L

    $s    = preg_replace( '@\x{0141}@u'    , "L",    $s );    // L => L

    $s    = preg_replace( '@\x{0140}@u'    , "l",    $s );    // ? => l

    $s    = preg_replace( '@\x{0142}@u'    , "l",    $s );    // l => l

    $s    = preg_replace( '@\x{014a}@u'    , "N",    $s );    // ? => N

    $s    = preg_replace( '@\x{0149}@u'    , "n",    $s );    // ? => n

    $s    = preg_replace( '@\x{014b}@u'    , "n",    $s );    // ? => n

    $s    = preg_replace( '@\x{00d8}@u'    , "O",    $s );    // ? => O

    $s    = preg_replace( '@\x{00f8}@u'    , "o",    $s );    // ? => o

    $s    = preg_replace( '@\x{017f}@u'    , "s",    $s );    // ? => s

    $s    = preg_replace( '@\x{00de}@u'    , "T",    $s );    // T => T

    $s    = preg_replace( '@\x{0166}@u'    , "T",    $s );    // T => T

    $s    = preg_replace( '@\x{00fe}@u'    , "t",    $s );    // t => t

    $s    = preg_replace( '@\x{0167}@u'    , "t",    $s );    // t => t

    

    // remove all non-ASCii characters

    $s    = preg_replace( '@[^\0-\x80]@u'    , "",    $s ); 

    

    

    // possible errors in UTF8-regular-expressions

    if (empty($s))

        return $original_string;

    else

        return $s; 

}

?>



The above function is mainly based on the following article:

http://ahinea.com/en/tech/accented-translate.html