Non-capturing group#

By default, everything that fell into the group is remembered. It’s called a capturing group.

Sometimes parentheses are needed to indicate a part of expression that repeats. And, in doing so, you don’t need to remember an expression.

For example, get a MAC address, VLAN and ports from log message:

In [1]: log = 'Jun  3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'

A regex that describes substrings:

In [2]: match = re.search('((\w{4}\.){2}\w{4}).+vlan (\d+).+port (\S+).+port (\S+)', log)

Expression consists of the following parts:

  • ((\w{4}\.){2}\w{4}) - MAC address gets here

  • \w{4}\. - this part describes 4 letters or digits and a dot

  • (\w{4}\.){2} - here parentheses are used to indicate that 4 letters or digits and a dot are repeated twice

  • \w{4} - then 4 letters or numbers

  • .+vlan (\d+) - VLAN number falls into the group

  • .+port (\S+) - first interface

  • .+port (\S+) - second interface

Method groups returns:

In [3]: match.groups()
Out[3]: ('f03a.b216.7ad7', 'b216.', '10', 'Gi0/5', 'Gi0/15')

The second element is essentially superfluous. It appeared in the output because of parentheses in expression (\w{4}\.){2}.

In this case, you need to disable capture in the group. This is done by adding ?: after the group’s opening parenthesis.

Now the expression looks like this:

In [4]: match = re.search('((?:\w{4}\.){2}\w{4}).+vlan (\d+).+port (\S+).+port (\S+)', log)

Accordingly, groups method returns:

In [5]: match.groups()
Out[5]: ('f03a.b216.7ad7', '10', 'Gi0/5', 'Gi0/15')