mb_ereg_replace

(PHP 4 >= 4.2.0, PHP 5, PHP 7)

mb_ereg_replace — Replace regular expression with multibyte support

说明

string mb_ereg_replace ( string $pattern , string $replacement , string $string [, string $option = "msr" ] )

Scans string for matches to pattern, then replaces the matched text with replacement

参数

pattern

The regular expression pattern.

Multibyte characters may be used in pattern.

replacement

The replacement text.

string

The string being checked.

option

Matching condition can be set by option parameter. If i is specified for this parameter, the case will be ignored. If x is specified, white space will be ignored. If m is specified, match will be executed in multiline mode and line break will be included in '.'. If p is specified, match will be executed in POSIX mode, line break will be considered as normal character. If e is specified, replacement string will be evaluated as PHP expression.

返回值

The resultant string on success, or FALSE on error.

更新日志

版本	说明
7.1.0	The e modifier has been deprecated.

注释

Note:
mb_regex_encoding() 指定的内部编码或字符编码将会当作此函数用的字符编码。

Warning

处理非信任的输入时从不使用 e 修饰符，就不会转码（即调用 preg_replace()）。不注意这些会很可能会导致应用程序引发远程代码执行的漏洞。

参见

mb_regex_encoding() - Set/Get character encoding for multibyte regex
mb_eregi_replace() - Replace regular expression with multibyte support ignoring case

User Contributed Notes

Alexey Khrulev 01-Sep-2017 07:10


If encoding of PHP script differs from encoding of string to be processed by mb_ereg_replace(), then you can't just write pattern in script. Both $pattern and $replacement must be converted to same encoding as string to be processed. In this example script is in UTF-8, file to be processed is in UTF-16LE encoding:



<?php

$file_encoding = 'UTF-16LE';

mb_regex_encoding( $file_encoding );



$pattern     = "aaa";

$replacement = "AAA";

$pattern_encoded     = mb_convert_encoding( $pattern,     $file_encoding, 'UTF-8' );

$replacement_encoded = mb_convert_encoding( $replacement, $file_encoding, 'UTF-8' );



$result = mb_ereg_replace( $pattern_encoded, $replacement_encoded, file_get_contents('UTF-16LE.txt') );

file_put_contents('UTF-16LE-updated.txt', $result);

?>

ms2705335 at gmail dot com 17-Jul-2017 07:10


As trng mentioned before you can use \\n for replacement but NOT \\\\n as mentioned in preg_replace docs. So string definition will be like:

$str = '\\1';

Anonymous 19-Jan-2016 08:23


Pluche's comment should REALLY be added to the documentation, preferably under the "$pattern" param description. It is crucial to using this function.

marco at thenetworksolution dot it 03-Feb-2014 11:47


To selectively uppercase parts of a string via mb_eregi_replace



    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper

('\\1')", $str, 'e');



Full example, how to fix an address manually typed, uppercasing the first letter of a words and keeping uppercase roman numerals and the letters A,B,C after the house number):



function ucAddress($str) {

// first lowercase all and use the default ucwords

    $str = ucwords(strtolower($str));

// let's fix the default ucwords...

// uppercase letters after house number (was lowercased by the strtolower above)

    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper

('\\1')", $str, 'e');

// the same for roman numerals

    $str = mb_eregi_replace('\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b', "strtoupper('\\0')", $str, 'e');

    return $str;

}



Dr. Marco Marsala

Network Solution srl

http://www.realizzazionesitigenova.it

marco at thenetworksolution dot it 03-Feb-2014 11:47


To selectively uppercase parts of a string via mb_eregi_replace



    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper

('\\1')", $str, 'e');



Full example, how to fix an address manually typed, uppercasing the first letter of a words and keeping uppercase roman numerals and the letters A,B,C after the house number):



function ucAddress($str) {

// first lowercase all and use the default ucwords

    $str = ucwords(strtolower($str));

// let's fix the default ucwords...

// uppercase letters after house number (was lowercased by the strtolower above)

    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper

('\\1')", $str, 'e');

// the same for roman numerals

    $str = mb_eregi_replace('\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b', "strtoupper('\\0')", $str, 'e');

    return $str;

}

trng 15-Jun-2011 06:22


You can use \\n for capture group in replacement.

And you can NOT use $n notation (unlike preg_replace function).

Pluche 30-Dec-2010 01:17


Unlike preg_replace, mb_ereg_replace doesn't use separators





Exemple with preg_replace :


<?php $data = preg_replace("/[^A-Za-z0-9\.\-]/","",$data); ?>





Exemple with mb_ereg_replace :


<?php $data = mb_ereg_replace("[^A-Za-z0-9\.\-]","",$data); ?>

daemoneye at gmail dot com 03-Feb-2009 11:53


I got a pretty nasty error while trying to parse table rows(all contents were set to UTF-8) from the database for a dictionary project. The idea was to get all the rows from the first table (that is a table with bulgarian phrase in the first field, and its translation in english, french and german in the next fields). I needed to index all the bulgarian words that are found in the table to make an intelligent search. And that is where my headache started.



First of all, even with mb_strtolower() a lot of cyrillic characters went corrupted (ex: 'т,ъ,у,ф,б,г,з,ж,' etc...). After an hour of different attempts I got such a solution:



<?php



mb_internal_encoding("UTF-8");

mb_regex_encoding("UTF-8");



$rows = $db->getRows();



$contents = array();

foreach ($rows as $eachRow)

{

    $cleared = str_replace($commonWords, ' ', mb_strtolower(stripslashes($eachRow['bulgarian']), 'UTF-8' ));

    if (trim($cleared) != '') $contents[] = trim($cleared);

}    



$list = array();

foreach ($contents as $eachRow)

{

    $exploded = explode(' ', $eachRow);

    foreach ($exploded as $eachExpl)

    {

        $eachExpl = mb_ereg_replace('[^а-я ]',' ', $eachExpl);

        if (trim($eachExpl) != '') 

            if (!in_array($eachExpl, $list, true))    $list[] = trim($eachExpl);

    }

}



?>



To work properly I got to set all the internal encoding settings to UTF-8. Else the default Latin-1 got half my database with missing characters.



I am posting this solution just in case someone has encountered a similar problem. Hope it helps you in case you need something like that.

keizo at gomo dot jp 24-Jul-2008 06:32


<?php


$pattern = "([あ-ん]+)[0-9]+";


$string = mb_ereg_replace($pattern, '「\\1」:\\0', $string);


?>





you can use \\n for capture group in replacement

gmx dot net at ulrich dot mierendorff 01-Jul-2008 04:39


If you want to replace characters like "?" or "?" you can use mb_ereg_replace, but it is very slow. str_replace is much faster and also works with characters like "?" or "?"!



I think this has something to with the fact that str_replace works on byte level and does not care about characters.

I hope that can help.

04-Dec-2006 05:36


'i' option does not work correctly with multibyte characters. The function does not locate/replace the multibyte string if it's different case then specified on multibyte needle which is in different case.

squeegee 01-Nov-2006 04:41


well, if you just calculated the length of the find and replace strings once instead of on every loop, it would likely speed it up a lot.

mpnicholas [@t] gmail (dot) com 10-Jul-2006 12:09


Regarding the mb_str_ireplace() function: I benchmarked it against mb_eregi_replace() for single-character substitution, and it was significantly slower. Despite avoiding the ereg call, I think the while loop ends slowing you down too much for this to be practical.

vondrej(at)gmail(dot)com 27-Feb-2006 12:47


Are you looking for htmlentities() for multibyte strings? This might help you - it just replace <, >, ", '





<?php


/**


 *  Multibyte equivalent for htmlentities() [lite version :)]


 *


 * @param string $str


 * @param string $encoding


 * @return string


 **/


function mb_htmlentities($str, $encoding = 'utf-8') {


    mb_regex_encoding($encoding);


    $pattern = array('<', '>', '"', '\'');


    $replacement = array('&lt;', '&gt;', '&quot;', '&#39;');


    for ($i=0; $i<sizeof($pattern); $i++) {


        $str = mb_ereg_replace($pattern[$i], $replacement[$i], $str);


    }


    return $str;


}


?>

faxe at neostrada dot pl 10-Aug-2005 12:52


A simple mb_str_ireplace() implementation - a faster (?) replacement for non-regexp multi-byte string replacement:





<?php


function mb_str_ireplace($co, $naCo, $wCzym)


{


    $wCzymM = mb_strtolower($wCzym);


    $coM    = mb_strtolower($co);


    $offset = 0;


    


        while(!is_bool($poz = mb_strpos($wCzymM, $coM, $offset)))


    {


        $offset = $poz + mb_strlen($naCo);


        $wCzym = mb_substr($wCzym, 0, $poz). $naCo .mb_substr($wCzym, $poz+mb_strlen($co));


        $wCzymM = mb_strtolower($wCzym);


    }


    


    return $wCzym;


}


?>





[thiago - EDITOR NOTE: This function has improvements from d-okumura [aat] fi{dot}kyd[dot]co.jp]