元字符

正则表达式的威力源于它可以在模式中拥有选择和重复的能力。一些字符被赋予特殊的涵义，使其不再单纯的代表自己，模式中的这种有特殊涵义的编码字符称为 元字符。

共有两种不同的元字符：一种是可以在模式中方括号外任何地方使用的，另外一种是需要在方括号内使用的。在方括号外使用的元字符如下：

\: 一般用于转义字符
^: 断言目标的开始位置(或在多行模式下是行首)
$: 断言目标的结束位置(或在多行模式下是行尾)
.: 匹配除换行符外的任何字符(默认)
[: 开始字符类定义
]: 结束字符类定义
|: 开始一个可选分支
(: 子组的开始标记
): 子组的结束标记
?: 作为量词，表示 0 次或 1 次匹配。位于量词后面用于改变量词的贪婪特性。 (查阅量词)
*: 量词，0 次或多次匹配
+: 量词，1 次或多次匹配
{: 自定义量词开始标记
}: 自定义量词结束标记

模式中方括号内的部分称为"字符类"。在一个字符类中仅有以下可用元字符：

\: 转义字符
^: 仅在作为第一个字符(方括号内)时，表明字符类取反
-: 标记字符范围

下面部分描述每个元字符的用法。

User Contributed Notes

Kurt Wei 17-Feb-2016 07:43


disturbing usage of "any character" for multi-lines...





remark:


'.' (all characters) just does NOT include on single character the newline (\n) by default,


while \n is included in all other matching searches (e.g. \s).


Funny enough, the "carriage return" (\r) is included, when using '.'





You have to write "(.|\\n)" instead of a single dot, with disadvantages in using complex matching-results,





or simple use the "s" modificator to bring dot to accept the newline.





$subject="<tag>Hello\nWorld</tag>";





preg_match( '/<tag>[A-Za-z\\s]*<\\/tag>/' , $subject ); //true


preg_match( '/<tag>[^<]*<\\/tag>/' , $subject ); //true


preg_match( '/<tag>(.|\\n)*<\\/tag>/' , $subject ); //true


preg_match( '/<tag>.*<\\/tag>/s' , $subject ); //true


preg_match( '/<tag>.*<\\/tag>/' , $subject ); //ATTENTION! *false*

Thomas 27-Mar-2015 09:38


The meta character $ accepts a (one) newline character (\n).



(Take a moment to let this information sink in)



You might want to (r)trim() your input afterwards if you have a match because otherwise it (still) might not meet a length requirement or other strange stuff might happen when you store the input as-is.