Match object#
In re
module, several functions return Match object if a match is found:
search
match
finditer
- returns an iterator with Match objects
This subsection covers methods of Match object.
Example of Match object:
In [1]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [2]: match = re.search(r'Host (\S+) in vlan (\d+) .* port (\S+) and port (\S+)', log)
In [3]: match
Out[3]: <_sre.SRE_Match object; span=(47, 124), match='Host f03a.b216.7ad7 in vlan 10 is flapping betwee>'
The 3rd line output simply displays information about object. Therefore, it is not necessary to rely on what is displayed in match part as displayed line is cut by a fixed number of characters.
group
#
Method group
returns a substring that matches an expression or an expression in a group.
If method is called without arguments, the whole substring is displayed:
In [4]: match.group()
Out[4]: 'Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
The same result returns group 0:
In [5]: match.group(0)
Out[5]: 'Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
Other numbers show only the contents of relevant group:
In [6]: match.group(1)
Out[6]: 'f03a.b216.7ad7'
In [7]: match.group(2)
Out[7]: '10'
In [8]: match.group(3)
Out[8]: 'Gi0/5'
In [9]: match.group(4)
Out[9]: 'Gi0/15'
If you call a group
method with a group number that is larger than number
of existing groups, there is an error:
In [10]: match.group(5)
-----------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-18-9df93fa7b44b> in <module>()
----> 1 match.group(5)
IndexError: no such group
If you call a method with multiple group numbers, the result is a tuple with strings that correspond to matches:
In [11]: match.group(1, 2, 3)
Out[11]: ('f03a.b216.7ad7', '10', 'Gi0/5')
Group may not get anything, then it will be matched with an empty string:
In [12]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [13]: match = re.search(r'Host (\S+) in vlan (\D*)', log)
In [14]: match.group(2)
Out[14]: ''
If group describes a part of template and there are more than one match, method displays the last match:
In [15]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [16]: match = re.search(r'Host (\w{4}\.)+', log)
In [17]: match.group(1)
Out[17]: 'b216.'
This is because expression in parentheses describes four letters or numbers, dot and then there is a plus. The first and the second part of MAC address matched to expression in parentheses. But only the last expression is remembered and returned.
If named groups are used in expression, group name can be passed to group
method and the corresponding substring can be obtained:
In [18]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [19]: match = re.search(r'Host (?P<mac>\S+) '
...: r'in vlan (?P<vlan>\d+) .* '
...: r'port (?P<int1>\S+) '
...: r'and port (?P<int2>\S+)',
...: log)
...:
In [20]: match.group('mac')
Out[20]: 'f03a.b216.7ad7'
In [21]: match.group('int2')
Out[21]: 'Gi0/15'
Groups are also available via number:
In [22]: match.group(3)
Out[22]: 'Gi0/5'
In [23]: match.group(4)
Out[23]: 'Gi0/15'
groups
#
Method group
returns a tuple with strings in which the elements are those
substrings that fall into respective groups:
In [24]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [25]: match = re.search(r'Host (\S+) '
...: r'in vlan (\d+) .* '
...: r'port (\S+) '
...: r'and port (\S+)',
...: log)
...:
In [26]: match.groups()
Out[26]: ('f03a.b216.7ad7', '10', 'Gi0/5', 'Gi0/15')
Method group
has an optional parameter - default. It returned when anything
that comes into group is optional.
For example, with this line the match will be in both the first group and the second:
In [26]: line = '100 aab1.a1a1.a5d3 FastEthernet0/1'
In [27]: match = re.search(r'(\d+) +(\w+)?', line)
In [28]: match.groups()
Out[28]: ('100', 'aab1')
If there is nothing in the line after space, nothing will get into the group. But the match will be because it is stated in regex that the group is optional:
In [30]: line = '100 '
In [31]: match = re.search(r'(\d+) +(\w+)?', line)
In [32]: match.groups()
Out[32]: ('100', None)
Accordingly, for the second group the value is None.
If group
method is given a default value, it will be returned
instead of None:
In [33]: line = '100 '
In [34]: match = re.search(r'(\d+) +(\w+)?', line)
In [35]: match.groups(default=0)
Out[35]: ('100', 0)
In [36]: match.groups(default='No match')
Out[36]: ('100', 'No match')
groupdict
#
Method groupdict
returns a dictionary in which keys are group names and
values are corresponding lines:
In [37]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [38]: match = re.search(r'Host (?P<mac>\S+) '
...: r'in vlan (?P<vlan>\d+) .* '
...: r'port (?P<int1>\S+) '
...: r'and port (?P<int2>\S+)',
...: log)
...:
In [39]: match.groupdict()
Out[39]: {'int1': 'Gi0/5', 'int2': 'Gi0/15', 'mac': 'f03a.b216.7ad7', 'vlan': '10'}
start
, end
#
start
and end
methods return indexes of the beginning and end of the match
of regex.
If methods are called without arguments, they return indexes for whole match:
In [40]: line = ' 10 aab1.a1a1.a5d3 FastEthernet0/1 '
In [41]: match = re.search(r'(\d+) +([0-9a-f.]+) +(\S+)', line)
In [42]: match.start()
Out[42]: 2
In [43]: match.end()
Out[43]: 42
In [45]: line[match.start():match.end()]
Out[45]: '10 aab1.a1a1.a5d3 FastEthernet0/1'
You can pass number or name of the group to methods. Then they return indexes for this group:
In [46]: match.start(2)
Out[46]: 9
In [47]: match.end(2)
Out[47]: 23
In [48]: line[match.start(2):match.end(2)]
Out[48]: 'aab1.a1a1.a5d3'
Similarly for named groups:
In [49]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [50]: match = re.search(r'Host (?P<mac>\S+) '
...: r'in vlan (?P<vlan>\d+) .* '
...: r'port (?P<int1>\S+) '
...: r'and port (?P<int2>\S+)',
...: log)
...:
In [51]: match.start('mac')
Out[51]: 52
In [52]: match.end('mac')
Out[52]: 66
span
#
Method span
returns a tuple with an index of the beginning and end of
substring. It works in a similar way to start
and end
methods,
but returns a pair of numbers.
Without arguments span
returns indexes for whole match:
In [53]: line = ' 10 aab1.a1a1.a5d3 FastEthernet0/1 '
In [54]: match = re.search(r'(\d+) +([0-9a-f.]+) +(\S+)', line)
In [55]: match.span()
Out[55]: (2, 42)
But you can also pass number of the group:
In [56]: line = ' 10 aab1.a1a1.a5d3 FastEthernet0/1 '
In [57]: match = re.search(r'(\d+) +([0-9a-f.]+) +(\S+)', line)
In [58]: match.span(2)
Out[58]: (9, 23)
Similarly for named groups:
In [59]: log = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'
In [60]: match = re.search(r'Host (?P<mac>\S+) '
...: r'in vlan (?P<vlan>\d+) .* '
...: r'port (?P<int1>\S+) '
...: r'and port (?P<int2>\S+)',
...: log)
...:
In [64]: match.span('mac')
Out[64]: (52, 66)
In [65]: match.span('vlan')
Out[65]: (75, 77)