Special symbols#
.- any character except new line character^- beginning of line$- end of line[abc]- any symbol in square brackets[^abc]- any symbol except those in square bracketsa|b- element a or b(regex)- expression is treated as one element. In addition, substring that matches an expression is memorized
.#
Dot represents any symbol.
Most often, a dot is used with repetition symbols + and * to indicate
that any character can be found between certain expressions.
For example, using expression Interface.+Port ID.+ you can describe a line
with interfaces in the output “sh cdp neighbors detail”:
In [1]: cdp = '''
...: SW1#show cdp neighbors detail
...: -------------------------
...: Device ID: SW2
...: Entry address(es):
...: IP address: 10.1.1.2
...: Platform: cisco WS-C2960-8TC-L, Capabilities: Switch IGMP
...: Interface: GigabitEthernet1/0/16, Port ID (outgoing port): GigabitEthernet0/1
...: Holdtime : 164 sec
...: '''
In [2]: re.search('Interface.+Port ID.+', cdp).group()
Out[2]: 'Interface: GigabitEthernet1/0/16, Port ID (outgoing port): GigabitEthernet0/1'
The result was only one string as the dot represents any character except line
feed character. In addition, repetition characters + and * by default
capture the longest string possible. This aspect is addressed in subsection “Greedy qualifiers”.
^#
Character ^ means the beginning of line. Expression ^\d+ corresponds to substring:
In [3]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [4]: re.search('^\d+', line).group()
Out[4]: '100'
Characters from beginning of line to pound sign (including pound):
In [5]: prompt = 'SW1#show cdp neighbors detail'
In [6]: re.search('^.+#', prompt).group()
Out[6]: 'SW1#'
$#
Symbol $ represents the end of a line.
Expression \S+$ describes any characters except whitespace at the end of line:
In [7]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [8]: re.search('\S+$', line).group()
Out[8]: 'FastEthernet0/1'
[]#
Symbols that are listed in square brackets mean that any of these symbols will be a match. Thus, different registers can be described:
In [9]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [10]: re.search('[Ff]ast', line).group()
Out[10]: 'Fast'
In [11]: re.search('[Ff]ast[Ee]thernet', line).group()
Out[11]: 'FastEthernet'
Using square brackets, you can specify which characters may meet at a
specific position. For example, expression ^.+[>#] describes characters
from the beginning of a line to # or > sign (including them).
This expression can be used to get the name of device:
In [12]: commands = ['SW1#show cdp neighbors detail',
...: 'SW1>sh ip int br',
...: 'r1-london-core# sh ip route']
...:
In [13]: for line in commands:
...: match = re.search('^.+[>#]', line)
...: if match:
...: print(match.group())
...:
SW1#
SW1>
r1-london-core#
You can specify character ranges in square brackets. For example, any number from 0 to 9:
In [14]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [15]: re.search('[0-9]+', line).group()
Out[15]: '100'
Letters:
In [16]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [17]: re.search('[a-z]+', line).group()
Out[17]: 'aa'
In [18]: re.search('[A-Z]+', line).group()
Out[18]: 'F'
Several ranges may be indicated in square brackets:
In [19]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [20]: re.search('[a-f0-9]+\.[a-f0-9]+\.[a-f0-9]+', line).group()
Out[20]: 'aa12.35fe.a5d3'
Expression [a-f0-9]+\.[a-f0-9]+\.[a-f0-9]+ describes three groups of
symbols separated by a dot. Characters in each group can be letters a-f
or digits 0-9. This expression describes MAC address.
Another feature of square brackets is that the special symbols within square brackets lose their special meaning and are simply a symbol. For example, a dot inside square brackets will denote a dot, not any symbol.
Expression [a-f0-9]+[./][a-f0-9]+ describes three groups of symbols:
letters a-f or digits 0-9
dot or slash
letters a-f or digits 0-9
For line string the match will be a such substring:
In [21]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [22]: re.search('[a-f0-9]+[./][a-f0-9]+', line).group()
Out[22]: 'aa12.35fe'
If first symbol in square brackets is ^, match will be any symbol except those in brackets.
In [23]: line = 'FastEthernet0/0 15.0.15.1 YES manual up up'
In [24]: re.search('[^a-zA-Z]+', line).group()
Out[24]: '0/0 15.0.15.1 '
In this case, expression describes everything except letters.
|#
Pipe symbol works like ‘or’:
In [25]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [26]: re.search('Fast|0/1', line).group()
Out[26]: 'Fast'
Note how | works - Fast и 0/1 are treated as an whole expression.
So in the end, expression means that we’re looking for Fast or 0/1.
()#
Parentheses are used to group expressions. As in mathematical expressions, parentheses can be used to indicate which elements the operation is applied to.
For example, expression [0-9]([a-f]|[0-9])[0-9] describes three characters: digit, then a letter or digit and digit:
In [27]: line = "100 aa12.35fe.a5d3 FastEthernet0/1"
In [28]: re.search('[0-9]([a-f]|[0-9])[0-9]', line).group()
Out[28]: '100'
Parentheses allow to indicate which expression is a one entity. This is particularly useful when using repetition symbols:
In [29]: line = 'FastEthernet0/0 15.0.15.1 YES manual up up'
In [30]: re.search('([0-9]+\.)+[0-9]+', line).group()
Out[30]: '15.0.15.1'
Parentheses not only allow you to group expressions. String that matches
expression in parentheses is memorized. It can be obtained separately by special methods
groups and group(n). This is covered in subsection “Grouping”.