define text patterns for pattern matching / pattern generation
global regular expression print(grep)
grep [opition] pattern [files]
- -v output line not match syntex
- -c number of occurance
default: output all occurrence of pattern follow expression
metacharacters:
- . any 1 character
- * 0-any character
- put after string means (0-n) times string before occur
- ^ begin of line
- $ end of line
- […] character class(match any of character in character class)
- \ prevent the use of meta character
- \<, \> match a string begin or end of word
- [^] :匹配指定范围外的任意单个字符
- fg:查找没有小写字母的行。
- grep ‘[^a-z]’ /etc/inittab
- [:space:]:表示空白字符
- [:punct:]:表示所有标点符号的集合
- [:lower:]:表示所有的小写字母
- [:upper:]:表示所有的大写字母
- [:alpha:]:表示大小写字母
- [:digit:]:表示数子
- [:alnum:]:表示数字和大小写字母—–使用格式[[:alnum:]]等
- ? 0/1 times (egrep)
- + 1-n times (egrep)
- a|b a or b (egrep)
- {m,n} m to n-times occur of character before (egrep)
- () express a small expression can use later (egrep)
- \n nth sub-expression in the string
grep “expression” | head -n
- on mac, no error message, because after run head give a signal to grep so it will stop and not go next
- on Linux, the grep go next first then receive the signal, so it print a error message
search in multiple files, grep will default output file name and matching line
- -l only print matching line
| POSIX | Non-standard | Perl/Tcl | Vim | ASCII | Description |
|---|---|---|---|---|---|
[:alnum:] |
[A-Za-z0-9] |
Alphanumeric characters | |||
[:word:] |
\w |
\w |
[A-Za-z0-9_] |
Alphanumeric characters plus “_” | |
\W |
\W |
[^A-Za-z0-9_] |
Non-word characters | ||
[:alpha:] |
\a |
[A-Za-z] |
Alphabetic characters | ||
[:blank:] |
\s |
[ \t] |
Space and tab | ||
\b |
\< \> |
(?<=\W)(?=\w)|(?<=\w)(?=\W) |
Word boundaries | ||
[:cntrl:] |
[\x00-\x1F\x7F] |
Control characters | |||
[:digit:] |
\d |
\d |
[0-9] |
Digits | |
\D |
\D |
[^0-9] |
Non-digits | ||
[:graph:] |
[\x21-\x7E] |
Visible characters | |||
[:lower:] |
\l |
[a-z] |
Lowercase letters | ||
[:print:] |
\p |
[\x20-\x7E] |
Visible characters and the space character | ||
[:punct:] |
[][!"#$%&'()*+,./:;<=>?@\^_`{|}~-] |
Punctuation characters | |||
[:space:] |
\s |
\_s |
[ \t\r\n\v\f] |
Whitespace characters | |
\S |
\S |
[^ \t\r\n\v\f] |
Non-whitespace characters | ||
[:upper:] |
\u |
[A-Z] |
Uppercase letters | ||
[:xdigit:] |
\x |
[A-Fa-f0-9] |
Hexadecimal digits |
| Meta- character(s) |
Description | Example[37] |
|---|---|---|
. |
Normally matches any character except a newline. Within square brackets the dot is literal. |
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
print "$string1 has length >= 5.\n";
}
Output: Hello World
has length >= 5.
|
( ) |
Groups a series of pattern elements to a single element. When you match a pattern within parentheses, you can use any of $1, $2, … later to refer to the previously matched pattern. |
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
print "We matched '$1' and '$2'.\n";
}
Output: We matched 'Hel' and 'o W'.
|
+ |
Matches the preceding pattern element one or more times. |
$string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
print "There are one or more consecutive letter \"l\"'s in $string1.\n";
}
Output: There are one or more consecutive letter "l"'s in Hello World.
|
? |
Matches the preceding pattern element zero or one time. |
$string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
print "There is an 'H' and a 'e' separated by ";
print "0-1 characters (e.g., He Hue Hee).\n";
}
Output: There is an 'H' and a 'e' separated by 0-1 characters (e.g., He Hue Hee).
|
? |
Modifies the *, +, ? or {M,N}‘d regex that comes before to match as few times as possible. |
$string1 = "Hello World\n";
if ($string1 =~ m/(l.+?o)/) {
print "The non-greedy match with 'l' followed by one or\n";
print "more characters is 'llo' rather than 'llo Wo'.\n";
}
Output: The non-greedy match with 'l' followed by one or
more characters is 'llo' rather than 'llo Wo'.
|
* |
Matches the preceding pattern element zero or more times. |
$string1 = "Hello World\n";
if ($string1 =~ m/el*o/) {
print "There is an 'e' followed by zero to many ";
print "'l' followed by 'o' (e.g., eo, elo, ello, elllo).\n";
}
Output: There is an 'e' followed by zero to many 'l' followed by 'o' (e.g., eo, elo, ello, elllo).
|
{M,N} |
Denotes the minimum M and the maximum N match count. N can be omitted and M can be 0: {M} matches “exactly” M times; {M,} matches “at least” M times; {0,N} matches “at most” N times.x* y+ z? is thus equivalent to x{0,} y{1,} z{0,1}. |
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
print "There exists a substring with at least 1 ";
print "and at most 2 l's in $string1\n";
}
Output: There exists a substring with at least 1 and at most 2 l's in Hello World
|
[…] |
Denotes a set of possible character matches. |
$string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]+/) {
print "$string1 contains one or more vowels.\n";
}
Output: Hello World
contains one or more vowels.
|
| |
Separates alternate possibilities. |
$string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi|Pogo)/) {
print "$string1 contains at least one of Hello, Hi, or Pogo.";
}
Output: Hello World
contains at least one of Hello, Hi, or Pogo.
|
\b |
Matches a zero-width boundary between a word-class character (see next) and either a non-word class character or an edge; same as(^\w|\w$|\W\w|\w\W). |
$string1 = "Hello World\n";
if ($string1 =~ m/llo\b/) {
print "There is a word that ends with 'llo'.\n";
}
Output: There is a word that ends with 'llo'.
|
\w |
Matches an alphanumeric character, including “_”; same as [A-Za-z0-9_] in ASCII, and
in Unicode,[33] where the |
$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
print "There is at least one alphanumeric ";
print "character in $string1 (A-Z, a-z, 0-9, _).\n";
}
Output: There is at least one alphanumeric character in Hello World
(A-Z, a-z, 0-9, _).
|
\W |
Matches a non-alphanumeric character, excluding “_”; same as [^A-Za-z0-9_] in ASCII, and
in Unicode. |
$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
print "The space between Hello and ";
print "World is not alphanumeric.\n";
}
Output: The space between Hello and World is not alphanumeric.
|
\s |
Matches a whitespace character, which in ASCII are tab, line feed, form feed, carriage return, and space; in Unicode, also matches no-break spaces, next line, and the variable-width spaces (amongst others). |
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
print "In $string1 there are TWO whitespace characters, which may";
print " be separated by other characters.\n";
}
Output: In Hello World
there are TWO whitespace characters, which may be separated by other characters.
|
\S |
Matches anything BUT a whitespace. |
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
print "In $string1 there are TWO non-whitespace characters, which";
print " may be separated by other characters.\n";
}
Output: In Hello World
there are TWO non-whitespace characters, which may be separated by other characters.
|
\d |
Matches a digit; same as [0-9] in ASCII;in Unicode, same as the \p{Digit} or \p{GC=Decimal_Number} property, which itself the same as the \p{Numeric_Type=Decimal} property. |
$string1 = "99 bottles of beer on the wall.";
if ($string1 =~ m/(\d+)/) {
print "$1 is the first number in '$string1'\n";
}
Output: 99 is the first number in '99 bottles of beer on the wall.'
|
\D |
Matches a non-digit; same as [^0-9] in ASCII or \P{Digit} in Unicode. |
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
print "There is at least one character in $string1";
print " that is not a digit.\n";
}
Output: There is at least one character in Hello World
that is not a digit.
|
^ |
Matches the beginning of a line or string. |
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
print "$string1 starts with the characters 'He'.\n";
}
Output: Hello World
starts with the characters 'He'.
|
$ |
Matches the end of a line or string. |
$string1 = "Hello World\n";
if ($string1 =~ m/rld$/) {
print "$string1 is a line or string ";
print "that ends with 'rld'.\n";
}
Output: Hello World
is a line or string that ends with 'rld'.
|
\A |
Matches the beginning of a string (but not an internal line). |
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/\AH/) {
print "$string1 is a string ";
print "that starts with 'H'.\n";
}
Output: Hello
World
is a string that starts with 'H'.
|
\z |
Matches the end of a string (but not an internal line).[38] |
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/d\n\z/) {
print "$string1 is a string ";
print "that ends with 'd\\n'.\n";
}
Output: Hello
World
is a string that ends with 'd\n'.
|
[^…] |
Matches every character except the ones inside brackets. |
$string1 = "Hello World\n";
if ($string1 =~ m/[^abc]/) {
print "$string1 contains a character other than ";
print "a, b, and c.\n";
}
Output: Hello World
contains a character other than a, b, and c.
|
from wikipedia(https://en.wikipedia.org/wiki/Regular_expression)