Special symbols#

  • . - any character except new line character

  • ^ - beginning of line

  • $ - end of line

  • [abc] - any symbol in square brackets

  • [^abc] - any symbol except those in square brackets

  • a|b - element a or b

  • (regex) - expression is treated as one element. In addition, substring that matches an expression is memorized

.#

Dot represents any symbol. Most often, a dot is used with repetition symbols + and * to indicate that any character can be found between certain expressions.

For example, using expression Interface.+Port ID.+ you can describe a line with interfaces in the output “sh cdp neighbors detail”:

In [1]: cdp = '''
   ...: SW1#show cdp neighbors detail
   ...: -------------------------
   ...: Device ID: SW2
   ...: Entry address(es):
   ...:   IP address: 10.1.1.2
   ...: Platform: cisco WS-C2960-8TC-L,  Capabilities: Switch IGMP
   ...: Interface: GigabitEthernet1/0/16,  Port ID (outgoing port): GigabitEthernet0/1
   ...: Holdtime : 164 sec
   ...: '''

In [2]: re.search('Interface.+Port ID.+', cdp).group()
Out[2]: 'Interface: GigabitEthernet1/0/16,  Port ID (outgoing port): GigabitEthernet0/1'

The result was only one string as the dot represents any character except line feed character. In addition, repetition characters + and * by default capture the longest string possible. This aspect is addressed in subsection “Greedy qualifiers”.

^#

Character ^ means the beginning of line. Expression ^\d+ corresponds to substring:

In [3]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [4]: re.search('^\d+', line).group()
Out[4]: '100'

Characters from beginning of line to pound sign (including pound):

In [5]: prompt = 'SW1#show cdp neighbors detail'

In [6]: re.search('^.+#', prompt).group()
Out[6]: 'SW1#'

$#

Symbol $ represents the end of a line.

Expression \S+$ describes any characters except whitespace at the end of line:

In [7]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [8]: re.search('\S+$', line).group()
Out[8]: 'FastEthernet0/1'

[]#

Symbols that are listed in square brackets mean that any of these symbols will be a match. Thus, different registers can be described:

In [9]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [10]: re.search('[Ff]ast', line).group()
Out[10]: 'Fast'

In [11]: re.search('[Ff]ast[Ee]thernet', line).group()
Out[11]: 'FastEthernet'

Using square brackets, you can specify which characters may meet at a specific position. For example, expression ^.+[>#] describes characters from the beginning of a line to # or > sign (including them). This expression can be used to get the name of device:

In [12]: commands = ['SW1#show cdp neighbors detail',
    ...:             'SW1>sh ip int br',
    ...:             'r1-london-core# sh ip route']
    ...:

In [13]: for line in commands:
    ...:     match = re.search('^.+[>#]', line)
    ...:     if match:
    ...:         print(match.group())
    ...:
SW1#
SW1>
r1-london-core#

You can specify character ranges in square brackets. For example, any number from 0 to 9:

In [14]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [15]: re.search('[0-9]+', line).group()
Out[15]: '100'

Letters:

In [16]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [17]: re.search('[a-z]+', line).group()
Out[17]: 'aa'

In [18]: re.search('[A-Z]+', line).group()
Out[18]: 'F'

Several ranges may be indicated in square brackets:

In [19]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [20]: re.search('[a-f0-9]+\.[a-f0-9]+\.[a-f0-9]+', line).group()
Out[20]: 'aa12.35fe.a5d3'

Expression [a-f0-9]+\.[a-f0-9]+\.[a-f0-9]+ describes three groups of symbols separated by a dot. Characters in each group can be letters a-f or digits 0-9. This expression describes MAC address.

Another feature of square brackets is that the special symbols within square brackets lose their special meaning and are simply a symbol. For example, a dot inside square brackets will denote a dot, not any symbol.

Expression [a-f0-9]+[./][a-f0-9]+ describes three groups of symbols:

  1. letters a-f or digits 0-9

  2. dot or slash

  3. letters a-f or digits 0-9

For line string the match will be a such substring:

In [21]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [22]: re.search('[a-f0-9]+[./][a-f0-9]+', line).group()
Out[22]: 'aa12.35fe'

If first symbol in square brackets is ^, match will be any symbol except those in brackets.

In [23]: line = 'FastEthernet0/0    15.0.15.1       YES manual up         up'

In [24]: re.search('[^a-zA-Z]+', line).group()
Out[24]: '0/0    15.0.15.1       '

In this case, expression describes everything except letters.

|#

Pipe symbol works like ‘or’:

In [25]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [26]: re.search('Fast|0/1', line).group()
Out[26]: 'Fast'

Note how | works - Fast и 0/1 are treated as an whole expression. So in the end, expression means that we’re looking for Fast or 0/1.

()#

Parentheses are used to group expressions. As in mathematical expressions, parentheses can be used to indicate which elements the operation is applied to.

For example, expression [0-9]([a-f]|[0-9])[0-9] describes three characters: digit, then a letter or digit and digit:

In [27]: line = "100     aa12.35fe.a5d3    FastEthernet0/1"

In [28]: re.search('[0-9]([a-f]|[0-9])[0-9]', line).group()
Out[28]: '100'

Parentheses allow to indicate which expression is a one entity. This is particularly useful when using repetition symbols:

In [29]: line = 'FastEthernet0/0    15.0.15.1       YES manual up         up'

In [30]: re.search('([0-9]+\.)+[0-9]+', line).group()
Out[30]: '15.0.15.1'

Parentheses not only allow you to group expressions. String that matches expression in parentheses is memorized. It can be obtained separately by special methods groups and group(n). This is covered in subsection “Grouping”.