Perl学习笔记二

1.the string abc as a regular expression by enclosing the string in slashes:
    while (<>) {
        if (/abc/) {
            print $_;
        }
    }
2.dot "." pattern
   This matches any single character except newline (\n). For example, the pattern /a./ matches any two-letter sequence
   that starts with a and is not "a\n".
3.A pattern-matching character class
     open and close square brackets and a list of characters between the brackets. One and only one of these characters
     must be present at the corresponding part of the string for the pattern to match. For example,
     /[abcde]/
    [0123456789] # match any single digit
    [0-9] # same thing
    [0-9\-] # match 0-9, or minus
    [a-z0-9] # match any single lowercase letter or digit
    [a-zA-Z0-9_] # match any single letter, digit, or underscore
    There's also a negated character class, which is the same as a character class, but has a leading up-arrow
    (or caret: ^) immediately after the left bracket. （if caret ^ is not right after the left bracket,then it means starting with）This character class matches any single character that is
    not in the list. For example:
    [^0-9] # match any single non-digit
    [^aeiouAEIOU] # match any single non-vowel
    [^\^] # match single character except an up-arrow

    Predefined Character Class Abbreviations
    Construct             Equivalent Class     Negated Construct     Equivalent Negated Class
    \d (a digit)             [0-9]                     \D (digits, not!)           [^0-9]
    \w (word char)     [a-zA-Z0-9_]         \W (words, not!)        [^a-zA-Z0-9_]
    \s (space char)     [ \r\t\n\f]                 \S (space, not!)         [^ \r\t\n\f]
4. Multipliers
    asterisk (*) as a grouping pattern. The asterisk indicates zero or more of the
        immediately previous character (or character class).
    Two other grouping patterns that work like this are the plus sign (+), meaning one or more of the
    immediately previous character, and the question mark (?), meaning zero or one of the immediately
    previous character. For example, the regular expression /fo+ba?r/ matches an f followed by one or
    more o's followed by a b, followed by an optional a, followed by an r.

    the general multiplier. The general multiplier consists of a pair of matching curly braces with one or two numbers
        inside, as in /x{5,10}/

    We could dispense with *, +, and ? entirely, since they are completely equivalent to {0,}, {1,}, and
    {0,1}. But it's easier to type the equivalent single punctuation character, and more familiar as well.

    If two multipliers occur in a single expression, the greedy rule is augmented with "leftmost is greediest."
    For example:
    $_ = "a xxx c xxxxxxxx c xxx d";
    /a.*c.*d/;
    In this case, the first ".*" in the regular expression matches all characters up to the second c, even
    though matching only the characters up to the first c would still allow the entire regular expression to
    match. Right now, this doesn't make any difference (the pattern would match either way), but later when
    we can look at parts of the regular expression that matched, it'll matter quite a bit.
    We can force any multiplier to be nongreedy (or lazy) by following it with a question mark:
    $_ = "a xxx c xxxxxxxx c xxx d";
    /a.*?c.*d/;
    Here, the a.*?c now matches the fewest characters between the a and c, not the most characters. This
    means the leftmost c is matched, not the rightmost. You can put such a question-mark modifier after any
    of the multipliers (?,+,*, and {m,n}).
    What if the string and regular expression were slightly altered, say, to:
    $_ = "a xxx ce xxxxxxxx ci xxx d";
    /a.*ce.*d/;
    In this case, if the .* matches the most characters possible before the next c, the next regular expression
    character (e) doesn't match the next character of the string (i). In this case, we get automatic
    backtracking: the multiplier is unwound and retried, stopping at someplace earlier (in this case, at the
    earlier c, next to the e).[2] A complex regular expression may involve many such levels of backtracking,
    leading to long execution times. In this case, making that match lazy (with a trailing "?") will actually
    simplify the work that Perl has to perform, so you may want to consider that.
    [2] Well, technically there was a lot of backtracking of the * operator to find the c's in the
    first place. But that's a little trickier to describe, and it works on the same principle.
5.Parentheses as memory

    To recall a memorized part of a string, you must include a backslash followed by an integer. This pattern
    construct represents the same sequence of characters matched earlier in the same-numbered pair of
    parentheses (counting from one). For example,
        /fred(.)barney\1/;
    matches a string consisting of fred, followed by any single non-newline character, followed by
    barney, followed by that same single character. So, it matches fredxbarneyx, but not
    fredxbarneyy. Compare that with
        /fred.barney./;

    /a(.)b(.)c\2d\1/;   # it matches axbycydx, for example
    /a(.*)b\1c/; matches an a, followed by any number of characters (even zero) followed by b, followed by that same
    sequence of characters followed by c. So, it would match aFREDbFREDc, or even abc, but not aXXbXXXc.

发表于 2009-03-14 23:33 persister 阅读(213) 评论(0) 编辑收藏所属分类: Perl

Perl学习笔记二

常用链接

留言簿

随笔分类(158)

随笔档案(145)

文章分类(7)

收藏夹

JAVA

Linux

ofbiz

php

Security

sql

test

搜索

最新评论

阅读排行榜

评论排行榜

Java天空任我翱翔
BlogJava \| 首页 \| 发新随笔 \| 发新文章 \| 联系 \| 聚合 \| 管理	随笔：127 文章：27 评论：17 引用：0