模式语法

User Contributed Notes

mbrodin 24-Nov-2008 10:18


Hi!



For even better prestanda of the code below, use;



<?php

    $f = array();



    foreach($allTags[1] as $tag){

    $f[] = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";

    }



    if(sizeof($f)) $str = preg_replace($f, ($stripContent ? '' : '${2}'), $str);

?>



This will not use preg_replace on every tag, instead it collect the regex as array, and then executes and should be better.



It also check so there are any regex to replace! If not, it will not start preg_replace! :)



Added the "<?php" so it will highlight the code!

datacompboy at call2ru dot com 29-Oct-2007 07:24


For example, you want to cut an some <div> element.

Accurate, from <div> to correspond </div> element.

Here is proof-of-concept code to do this:



<?

$str = "<dqiv1>1+<div2>2+<div3><b><c>3</c></b></div3>2-</div2>1-</div1>";



preg_match("#<div.> ( ".

              " ( (?>[^<]*) ( < ( ([^/d]|d([^i]|i[^v])) | /([^d]|d([^i]|i[^v])) ) )? )* ".

           " | (?R) )* </div.>#xi", $str, $m);

var_dump($m[0]);



?>



it match accurate from <div2> to </div2>. And, if you change <dqiv1> to <div1>, it will match from <div1> to </div1>

chris at madblanks dot org 04-Jul-2007 09:22


When enclosing your regular expression in double quotes, back references require two backslashes.



For example, \1 is the ascii character \1. You need to provide \\1 to get the back reference.

sam marshall 24-May-2007 07:23


For anyone who sees this error: 



Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at ...



As this manual page says, you need PHP 5.1.0 and the /u modifier in order to enable these features, but that isn't the only requirement! It is possible to install later versions of PHP (we have 5.1.4) while linking to an older PCRE install. A quick look at the PCRE changelog suggests that you probably need at least PCRE 5; we're running 4.5, while the latest is 7.1. You can find out your PCRE version by checking phpinfo().



I suspect this ancient PCRE version is included in some officially-supported Red Hat Enterprise package which is probably why we are running it so might also affect other people.

pstradomski at gmail dot com 29-Mar-2007 04:55


About strip_selected_tags function from two posts below:



it does not work if somebody uses tags without ending ">" character, like this:



<p <b> bold text </b</p



This  is even valid HTML (but not valid XHTML)

theppg_001 at hotmail dot com 20-Nov-2006 10:22


Hi there

This was originally made by someone eles but it didn't work correctly and so I remade it and as far as I know it works right.



<?php

/**

* strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] )

* ---------------------------------------------------------------------

* Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept.

* strip_tags: string with tags to strip, ex: "<a><p><quote>" etc.

* strip_content flag: TRUE will also strip everything between open and closed tag

*/

function strip_selected_tags($str, $tags = "", $stripContent = false)

{

    preg_match_all("/<([^>]+)>/i", $tags, $allTags, PREG_PATTERN_ORDER);

    $replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";

    foreach ($allTags[1] as $tag) {

        if ($stripContent) {

            $str = preg_replace($replace,'',$str);

        }

            $str = preg_replace($replace,'${2}',$str);

    }

    return $str;

}

?>



Before I 'fixed' it, when running

strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")

You would get back

"this is <p align=\"center\">a test</p> and this is bold"

Why? Because it did not take into account that there could be options etc in the HTML Tag.

My one works perfectly when stripping just the tags or the tag and its contents too!



So now when you run 

strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")

You get back

"this is a test and this is bold"

Or when running

strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>",true)

You get back

"this is  and "



Hope it helps someone :)

Daniel Vandersluis 23-Nov-2005 07:50


Concerning note #6 in "Differences From Perl", the \G token *is* supported as the last match position anchor. This has been confirmed to work at least in preg_replace(), though I'd assume it'd work in preg_match_all(), and other functions that can make more than one match, as well.

roland dot illig at gmx dot de 08-Nov-2005 10:02


<quote>

9. Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not. However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset.

</quote>



The last sentence does not indicate a bug. If the string "a" should match against the regular expression /^(a)?a/, the last "a" in the regex must be matched by any literal "a" in the string. The rest of the string is "", which obviously does not match the first /^(a)/.

Ned Baldessin 16-Jul-2005 01:14


Although \w and \W do include as "word characters" locale-specific characters (like "?" if you are using the "fr" locale), \b and \B do not work the same way.



For example :

"foo ?tait bar"   =>   /\W(?tait)\W/   =>   This captures correctly "?tait".

"foo ?tait bar"   =>   /\b(?tait)\b/   =>   This fails to capture it.



This is confusing, because the manual talks in both cases about "word characters", but fails to mention the difference in behaviour.

onerob at gmail dot com 02-Apr-2005 01:51


If, like me, you tend to use the /U pattern modifier, then you will need to remember that using ? or * to to test for optional characters will match zero characters if it means that the rest of the pattern can continue matching, even if the optional characters exist.



For instance, if we have this string:



a___bcde



and apply this pattern:



'/a(_*).*e/U'



The whole pattern is matched but none of the _ characters are placed in the sub-pattern. The way around this (if you still wish to use /U) is to use the ? greediness inverter. eg,



'/a(_*?).*e/U'

W W W 07-Mar-2005 04:22


Back references are a great way to achieve exact matching when it would have been impossible any other way. Take these three strings.



1) "www.www.com"

2) 'www.www.com'

3) "www.www.com'



The regex /^("|').+?("|')$/ would match all three strings but what if you needed the 3rd string above to be illegal because the quotes are not the same? You could write four different regexes to check for every possible case OR you could use back references.



/^("|').+?\1$/ will match strings 1 and 2 but not string 3. Try this code for further proof:



$str_test="'www.www.com\"";

$int_count=preg_match("/^(\"|').+?\\1$/", $str_test, $matches, PREG_OFFSET_CAPTURE);



The preg_match function will not match against $str_test because the quotes are mismatched. If you change $str_test to



$str_test = "'www.www.com'";



the preg_match will work.

info at atjeff dot co dot nz 08-Feb-2005 01:46


ive never used regex expressions till now and had loads of difficulty trying to convert a [url]link here[/url] into an href for use with posting messages on a forum, heres what i manage to come up with:



$patterns = array(

            "/\[link\](.*?)\[\/link\]/",

            "/\[url\](.*?)\[\/url\]/",

            "/\[img\](.*?)\[\/img\]/",

            "/\[b\](.*?)\[\/b\]/",

            "/\[u\](.*?)\[\/u\]/",

            "/\[i\](.*?)\[\/i\]/"

        );

        $replacements = array(

            "<a href=\"\\1\">\\1</a>",

            "<a href=\"\\1\">\\1</a>",

            "<img src=\"\\1\">",

            "<b>\\1</b>",

            "<u>\\1</u>",

            "<i>\\1</i>"

            

        );

        $newText = preg_replace($patterns,$replacements, $text);



at first it would collect ALL the tags into one link/bold/whatever, until i added the "?" i still dont fully understand it... but it works :)

J Daugherty 09-Dec-2004 06:06


In the character class meta-character documentation above, the circumflex (^) is described:



"^   negate the class, but only if the first character"



It should be a little more verbose to fully express the meaning of ^:



^    Negate the character class.  If used, this must be the first character of the class (e.g. "[^012]").

napalm at spiderfish dot net 17-Mar-2004 05:14


Pay attention that some pcre features such as once-only or recursive patterns are not implemented in php versions prior to 5.00



Napalm

模式语法

Table of Contents

User Contributed Notes