Command: Regular Expression

define text patterns for pattern matching / pattern generation

global regular expression print(grep)

grep [opition] pattern [files]

  • -v output line not match syntex
  • -c number of occurance

default: output all occurrence of pattern follow expression

metacharacters:

  • .                  any 1 character
  • *                 0-any character
    • put after string means (0-n) times string before occur
  • ^                  begin of line
  • $                  end of line
  • […]             character class(match any of character in character class)
  • \                   prevent the use of meta character
  • \<,  \>         match a string begin or end of word
  • [^] :匹配指定范围外的任意单个字符
  • fg:查找没有小写字母的行。
  • grep ‘[^a-z]’ /etc/inittab
  • [:space:]:表示空白字符
  • [:punct:]:表示所有标点符号的集合
  • [:lower:]:表示所有的小写字母
  • [:upper:]:表示所有的大写字母
  • [:alpha:]:表示大小写字母
  • [:digit:]:表示数子
  • [:alnum:]:表示数字和大小写字母—–使用格式[[:alnum:]]等
  • ?                  0/1 times (egrep)
  • +                  1-n times (egrep)
  • a|b              a or b (egrep)
  • {m,n}         m to n-times occur of character before   (egrep)
  • ()                 express a small expression can use later (egrep)
  • \n                nth sub-expression in the string

grep “expression” | head -n

  • on mac, no error message, because after run head give a signal to grep so  it will stop and not go next
  • on Linux, the grep go next first then receive the signal, so it print a error message

search in multiple files, grep will default output file name and matching line

  • -l only print matching line
POSIX Non-standard Perl/Tcl Vim ASCII Description
[:alnum:] [A-Za-z0-9] Alphanumeric characters
[:word:] \w \w [A-Za-z0-9_] Alphanumeric characters plus “_”
\W \W [^A-Za-z0-9_] Non-word characters
[:alpha:] \a [A-Za-z] Alphabetic characters
[:blank:] \s [ \t] Space and tab
\b \< \> (?<=\W)(?=\w)|(?<=\w)(?=\W) Word boundaries
[:cntrl:] [\x00-\x1F\x7F] Control characters
[:digit:] \d \d [0-9] Digits
\D \D [^0-9] Non-digits
[:graph:] [\x21-\x7E] Visible characters
[:lower:] \l [a-z] Lowercase letters
[:print:] \p [\x20-\x7E] Visible characters and the space character
[:punct:] [][!"#$%&'()*+,./:;<=>?@\^_`{|}~-] Punctuation characters
[:space:] \s \_s [ \t\r\n\v\f] Whitespace characters
\S \S [^ \t\r\n\v\f] Non-whitespace characters
[:upper:] \u [A-Z] Uppercase letters
[:xdigit:] \x [A-Fa-f0-9] Hexadecimal digits

 

Meta-
character(s)
Description Example[37]
. Normally matches any character except a newline.
Within square brackets the dot is literal.
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
  print "$string1 has length >= 5.\n";
}

Output:

Hello World
 has length >= 5.
( ) Groups a series of pattern elements to a single element.
When you match a pattern within parentheses, you can use any of $1, $2, … later to refer to the previously matched pattern.
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
  print "We matched '$1' and '$2'.\n";
}

Output:

We matched 'Hel' and 'o W'.
+ Matches the preceding pattern element one or more times.
$string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
  print "There are one or more consecutive letter \"l\"'s in $string1.\n";
}

Output:

There are one or more consecutive letter "l"'s in Hello World.
? Matches the preceding pattern element zero or one time.
$string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
  print "There is an 'H' and a 'e' separated by ";
  print "0-1 characters (e.g., He Hue Hee).\n";
}

Output:

There is an 'H' and a 'e' separated by 0-1 characters (e.g., He Hue Hee).
? Modifies the *, +, ? or {M,N}‘d regex that comes before to match as few times as possible.
$string1 = "Hello World\n";
if ($string1 =~ m/(l.+?o)/) {
  print "The non-greedy match with 'l' followed by one or\n";
  print "more characters is 'llo' rather than 'llo Wo'.\n";
}

Output:

The non-greedy match with 'l' followed by one or
more characters is 'llo' rather than 'llo Wo'.
* Matches the preceding pattern element zero or more times.
$string1 = "Hello World\n";
if ($string1 =~ m/el*o/) {
  print "There is an 'e' followed by zero to many ";
  print "'l' followed by 'o' (e.g., eo, elo, ello, elllo).\n";
}

Output:

There is an 'e' followed by zero to many 'l' followed by 'o' (e.g., eo, elo, ello, elllo).
{M,N} Denotes the minimum M and the maximum N match count.
N can be omitted and M can be 0: {M} matches “exactly” M times; {M,} matches “at least” M times; {0,N} matches “at most” N times.
x* y+ z? is thus equivalent to x{0,} y{1,} z{0,1}.
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
  print "There exists a substring with at least 1 ";
  print "and at most 2 l's in $string1\n";
}

Output:

There exists a substring with at least 1 and at most 2 l's in Hello World
[…] Denotes a set of possible character matches.
$string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]+/) {
  print "$string1 contains one or more vowels.\n";
}

Output:

Hello World
 contains one or more vowels.
| Separates alternate possibilities.
$string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi|Pogo)/) {
  print "$string1 contains at least one of Hello, Hi, or Pogo.";
}

Output:

Hello World
 contains at least one of Hello, Hi, or Pogo.
\b Matches a zero-width boundary between a word-class character (see next) and either a non-word class character or an edge; same as(^\w|\w$|\W\w|\w\W).
$string1 = "Hello World\n";
if ($string1 =~ m/llo\b/) {
  print "There is a word that ends with 'llo'.\n";
}

Output:

There is a word that ends with 'llo'.
\w Matches an alphanumeric character, including “_”;
same as [A-Za-z0-9_] in ASCII, and

[\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}]

in Unicode,[33] where the Alphabetic property contains more than Latin letters, and the Decimal_Number property contains more than Arab digits.

$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
  print "There is at least one alphanumeric ";
  print "character in $string1 (A-Z, a-z, 0-9, _).\n";
}

Output:

There is at least one alphanumeric character in Hello World
 (A-Z, a-z, 0-9, _).
\W Matches a non-alphanumeric character, excluding “_”;
same as [^A-Za-z0-9_] in ASCII, and

[^\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}]

in Unicode.

$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
  print "The space between Hello and ";
  print "World is not alphanumeric.\n";
}

Output:

The space between Hello and World is not alphanumeric.
\s Matches a whitespace character,
which in ASCII are tab, line feed, form feed, carriage return, and space;
in Unicode, also matches no-break spaces, next line, and the variable-width spaces (amongst others).
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
  print "In $string1 there are TWO whitespace characters, which may";
  print " be separated by other characters.\n";
}

Output:

In Hello World
 there are TWO whitespace characters, which may be separated by other characters.
\S Matches anything BUT a whitespace.
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
  print "In $string1 there are TWO non-whitespace characters, which";
  print " may be separated by other characters.\n";
}

Output:

In Hello World
 there are TWO non-whitespace characters, which may be separated by other characters.
\d Matches a digit;
same as [0-9] in ASCII;
in Unicode, same as the \p{Digit} or \p{GC=Decimal_Number} property, which itself the same as the \p{Numeric_Type=Decimal} property.
$string1 = "99 bottles of beer on the wall.";
if ($string1 =~ m/(\d+)/) {
  print "$1 is the first number in '$string1'\n";
}

Output:

99 is the first number in '99 bottles of beer on the wall.'
\D Matches a non-digit;
same as [^0-9] in ASCII or \P{Digit} in Unicode.
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
  print "There is at least one character in $string1";
  print " that is not a digit.\n";
}

Output:

There is at least one character in Hello World
 that is not a digit.
^ Matches the beginning of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
  print "$string1 starts with the characters 'He'.\n";
}

Output:

Hello World
 starts with the characters 'He'.
$ Matches the end of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/rld$/) {
  print "$string1 is a line or string ";
  print "that ends with 'rld'.\n";
}

Output:

Hello World
 is a line or string that ends with 'rld'.
\A Matches the beginning of a string (but not an internal line).
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/\AH/) {
  print "$string1 is a string ";
  print "that starts with 'H'.\n";
}

Output:

Hello
World
 is a string that starts with 'H'.
\z Matches the end of a string (but not an internal line).[38]
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/d\n\z/) {
  print "$string1 is a string ";
  print "that ends with 'd\\n'.\n";
}

Output:

Hello
World
 is a string that ends with 'd\n'.
[^…] Matches every character except the ones inside brackets.
$string1 = "Hello World\n";
if ($string1 =~ m/[^abc]/) {
 print "$string1 contains a character other than ";
 print "a, b, and c.\n";
}

Output:

Hello World
 contains a character other than a, b, and c.

from wikipedia(https://en.wikipedia.org/wiki/Regular_expression)

留下评论