Regular expression syntax#
re module to work with regular expressions (regex).
To get started with regular expressions, you need to import
This section will use
search function for all examples. And in the next
chapter, the rest of functions of
re module will be covered.
search function is:
match = re.search(pattern, string, flags=0)
search has three parameters:
pattern - regular expression
string - string in which search pattern is searched
flags - change regex behavior (covered in next chapter)
If a match is found, function will return special object Match. If there is no match, function will return None.
Important distinction of
search function is that it only looks for a first
match. That is, if there are several substrings in a line that correspond to a
search will return only the first match found.
The simplest example of a regex is a substring:
In : import re In : int_line = ' MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec,' In : match = re.search('MTU', int_line)
In this example:
first import module
then goes an example of string int_line
in line 3 a search pattern is passed to
searchfunction plus string int_line in which the match is searched
In this case we are simply looking for whether there is ‘MTU’ substring in string int_line.
If it exists,
match variable will contain a special Match object:
In : print(match) <_sre.SRE_Match object; span=(2, 5), match='MTU'>
Match object has several methods that allow to get different information about
received match. For example,
group method shows that string matches an
In this case, it’s just a ‘MTU’ substring:
In : match.group() Out: 'MTU'
If there was no match,
match variable will have None value:
In : int_line = ' MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec,' In : match = re.search('MU', int_line) In : print(match) None
The full potential of regular expressions is revealed when using special characters.
For example, symbol
\d means a digit,
+ means repetition of previous
symbol one or more times. If you combine them
\d+, you get an expression
that means one or more digits.
Using this expression, you can get the part of string that describes bandwidth:
In : int_line = ' MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec,' In : match = re.search('BW \d+', int_line) In : match.group() Out: 'BW 10000'
Regular expressions are particularly useful in getting certain substrings from a string. For example, it is necessary to get VLAN, MAC and ports from the output of such log message:
In : log2 = 'Oct 3 12:49:15.941: %SW_MATM-4-MACFLAP_NOTIF: Host f04d.a206.7fd6 in vlan 1 is flapping between port Gi0/5 and port Gi0/16'
This can be done with regex:
In : re.search('Host (\S+) in vlan (\d+) is flapping between port (\S+) and port (\S+)', log2).groups() Out: ('f04d.a206.7fd6', '1', 'Gi0/5', 'Gi0/16')
group returns only those parts of original string that are in
parentheses. Thus, by placing a part of expression in parentheses, you can specify
which parts of the line you want to remember.
\d+ has been used before - it describes one or more digits.
\S+ describes all characters except whitespace (space, tab, etc.).
The following subsections deal with special characters that are used in regular expressions.
If you know what special characters mean in regular expressions, you can
skip the following subsection and immediately switch to subsection about