正则表达式 DFA and NFA

deterministic finite automaton (DFA),

non-deterministic finite automata (NFAs or NDFAs).

the syntax of regular expressions in Perl:

Do case-insensitive pattern matching.

If use locale is in effect, the case map is taken from the current locale. See the perllocale manpage.

Treat string as multiple lines. That is, change ``^'' and ``$'' from matching at only the very start or end of the string to the start or end of any line anywhere within the string,

Treat string as single line. That is, change ``.'' to match any character whatsoever, even a newline, which it normally would not match.

The /s and /m modifiers both override the $* setting. That is, no matter what $* contains, /s without /m will force ``^'' to match only at the beginning of the string and ``$'' to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the ``.'' match any character whatsoever, while yet allowing ``^'' and ``$'' to match, respectively, just after and just before newlines within the string.

Extend your pattern's legibility by permitting whitespace and comments.

These are usually written as ``the /x modifier'', even though the delimiter in question might not actually be a slash. In fact, any of these modifiers may also be embedded within the regular expression itself using the new (?...) construct. See below.

The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in the pattern (outside of a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did not intend to close the pattern early. See the C-comment deletion code in the perlop manpage.

关于 /m/s 给出一个合理的解释：（通过现象分析实质）

By default, the ``^'' character is guaranteed to match at only the beginning of the string, the ``$'' character at only the end (or before the newline at the end) and Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by ``^'' or ``$''. You may, however, wish to treat a string as a multi-line buffer, such that the ``^'' will match after any newline within the string, and ``$'' will match before any newline. At the cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this practice is now deprecated.)

To facilitate multi-line substitutions, the ``.'' character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't. The /s modifier also overrides the setting of $*, in case you have some (badly behaved) older code that sets it in another module.

当有.出现在匹配换行符的位置的时候，那么就将正则在解析的时候 /s 的优先级要高，也就是将字符串进行 sigle line 的解析了。
当出现 ^ 或者 $ 来匹配开始位置和结束位置的时候，即使这个时候也出现了 . 符号来匹配换行,正则在解析的时候 /m 的优先级要搞，也就是将字符串进行 multiple lines 的解析了。

这就是两个的正则符号的并集，一个不行，另一个顶上的原则。
具体可以通过相应的正则调试工具进行测试。

在 multiple lines 中 . 符号是永远也不会用来匹配 newline 的，也就是 /m 的优先级屏蔽了 . 符号对于 newline 的匹配，如果要使 . 能够匹配 newline, 那么请使用 /s

关于 /x 的合理解释：（通过例子调试获取结果）
/x 也成为扩展模式，这是 Regex Match Tracer 告诉我们的。他在正则表达式中允许出现空格以及 # 的注释，但是这些注释字符串（空格以及 # 后面出现的字符）并不匹配实际的字符串。

转义字符 \Q...\E

使用 \Q 开始，\E 结束，可使中间的标点符号失去特殊意义，将中间的字符作为普通字符。

使用 \U 开始，\E 结束，除了具有 \Q...\E 相同的功能外，还将中间的小写字母转换成大写。在大小写敏感模式下，只能与大写文本匹配。

使用 \L 开始，\E 结束，除了具有 \Q...\E 相同的功能外，还将中间的大写字母转换成小写。在大小写敏感模式下，只能与小写文本匹配。

说明

\Q...\E 适合用于：表达式中需要比较长的普通文本，而其中包含了特殊符号。

举例

表达式

说明

\Q(a+b)*3\E

可匹配文本 "(a+b)*3"。

$a\+b$\*3

如果不使用 \Q...\E 进行转义，则对每个特殊符号进行转义。

表达式	说明
*\Q(a+b)3\E**	可匹配文本 "(a+b)*3"。
*\(a\+b\)\3**	如果不使用 \Q...\E 进行转义，则对每个特殊符号进行转义。

posted on 2008-12-21 23:02 CopyHoo 阅读(1365) 评论(0) 编辑收藏所属分类: Java Web

常用链接

留言簿

随笔档案(2)

文章分类(109)

文章档案(89)

搜索

最新评论

阅读排行榜

评论排行榜

转义字符 \Q...\E

说明

举例


只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理
相关文章: 正则表达式 DFA and NFA 关于 jsp 的解释执行。 tomcat 根据自己的测试结果学习。 javascript动态增加行的错误（问题比较经典） js细节札记 html 的 select 组关于 select 的添加 option 应该注意的问题。 select元素的options.add 与 insertbefore的区别工程在不断改正后，在web 上面没有显示出来的原因解析。关于页面以及 iframe 造成的缩进。