1.the string abc as a regular expression by enclosing the string in slashes:
while (<>) {
if (/abc/) {
print $_;
}
}
2.dot "." pattern
This matches any single character except newline (\n). For example, the pattern /a./ matches any two-letter sequence
that starts with a and is not "a\n".
3.A pattern-matching character class
open and close square brackets and a list of characters between the brackets. One and only one of these characters
must be present at the corresponding part of the string for the pattern to match. For example,
/[abcde]/
[0123456789] # match any single digit
[0-9] # same thing
[0-9\-] # match 0-9, or minus
[a-z0-9] # match any single lowercase letter or digit
[a-zA-Z0-9_] # match any single letter, digit, or underscore
There's also a negated character class, which is the same as a character class, but has a leading up-arrow
(or caret: ^) immediately after the left bracket. (if caret ^ is not right after the left bracket,then it means starting with)This character class matches any single character that is
not in the list. For example:
[^0-9] # match any single non-digit
[^aeiouAEIOU] # match any single non-vowel
[^\^] # match single character except an up-arrow
Predefined Character Class Abbreviations
Construct Equivalent Class Negated Construct Equivalent Negated Class
\d (a digit) [0-9] \D (digits, not!) [^0-9]
\w (word char) [a-zA-Z0-9_] \W (words, not!) [^a-zA-Z0-9_]
\s (space char) [ \r\t\n\f] \S (space, not!) [^ \r\t\n\f]
4. Multipliers
asterisk (*) as a grouping pattern. The asterisk indicates zero or more of the
immediately previous character (or character class).
Two other grouping patterns that work like this are the plus sign (+), meaning one or more of the
immediately previous character, and the question mark (?), meaning zero or one of the immediately
previous character. For example, the regular expression /fo+ba?r/ matches an f followed by one or
more o's followed by a b, followed by an optional a, followed by an r.
the general multiplier. The general multiplier consists of a pair of matching curly braces with one or two numbers
inside, as in /x{5,10}/
We could dispense with *, +, and ? entirely, since they are completely equivalent to {0,}, {1,}, and
{0,1}. But it's easier to type the equivalent single punctuation character, and more familiar as well.
If two multipliers occur in a single expression, the greedy rule is augmented with "leftmost is greediest."
For example:
$_ = "a xxx c xxxxxxxx c xxx d";
/a.*c.*d/;
In this case, the first ".*" in the regular expression matches all characters up to the second c, even
though matching only the characters up to the first c would still allow the entire regular expression to
match. Right now, this doesn't make any difference (the pattern would match either way), but later when
we can look at parts of the regular expression that matched, it'll matter quite a bit.
We can force any multiplier to be nongreedy (or lazy) by following it with a question mark:
$_ = "a xxx c xxxxxxxx c xxx d";
/a.*?c.*d/;
Here, the a.*?c now matches the fewest characters between the a and c, not the most characters. This
means the leftmost c is matched, not the rightmost. You can put such a question-mark modifier after any
of the multipliers (?,+,*, and {m,n}).
What if the string and regular expression were slightly altered, say, to:
$_ = "a xxx ce xxxxxxxx ci xxx d";
/a.*ce.*d/;
In this case, if the .* matches the most characters possible before the next c, the next regular expression
character (e) doesn't match the next character of the string (i). In this case, we get automatic
backtracking: the multiplier is unwound and retried, stopping at someplace earlier (in this case, at the
earlier c, next to the e).[2] A complex regular expression may involve many such levels of backtracking,
leading to long execution times. In this case, making that match lazy (with a trailing "?") will actually
simplify the work that Perl has to perform, so you may want to consider that.
[2] Well, technically there was a lot of backtracking of the * operator to find the c's in the
first place. But that's a little trickier to describe, and it works on the same principle.
5.Parentheses as memory
To recall a memorized part of a string, you must include a backslash followed by an integer. This pattern
construct represents the same sequence of characters matched earlier in the same-numbered pair of
parentheses (counting from one). For example,
/fred(.)barney\1/;
matches a string consisting of fred, followed by any single non-newline character, followed by
barney, followed by that same single character. So, it matches fredxbarneyx, but not
fredxbarneyy. Compare that with
/fred.barney./;
/a(.)b(.)c\2d\1/; # it matches axbycydx, for example
/a(.*)b\1c/; matches an a, followed by any number of characters (even zero) followed by b, followed by that same
sequence of characters followed by c. So, it would match aFREDbFREDc, or even abc, but not aXXbXXXc.