Compile function#

Python has the ability to pre-compile a regular expression and then use it. This is particularly useful when regex is used a lot in the script.

The use of a compiled expression can speed up processing and it is generally more convenient to use this option as the program divides the creation of a regex and its use. In addition, using re.compile function creates a RegexObject object that has several additional features that are not present in MatchObject object.

To compile a regex, use re.compile:

In [52]: regex = re.compile(r'\d+ +\S+ +\w+ +\S+')

It returns RegexObject object:

In [53]: regex
Out[53]: re.compile(r'\d+ +\S+ +\w+ +\S+', re.UNICODE)

RegexObject has such methods and attributes:

In [55]: [method for method in dir(regex) if not method.startswith('_')]
Out[55]:
['findall',
 'finditer',
 'flags',
 'fullmatch',
 'groupindex',
 'groups',
 'match',
 'pattern',
 'scanner',
 'search',
 'split',
 'sub',
 'subn']

Note that Regex object has search, match, finditer, findall methods available. These are the same functions that are available in module globally, but now they have to be applied to object.

An example of using search method:

In [67]: line = ' 100    a1b2.ac10.7000    DYNAMIC     Gi0/1'

In [68]: match = regex.search(line)

Now search should be called as method of regex object. The result is a Match object:

In [69]: match
Out[69]: <_sre.SRE_Match object; span=(1, 43), match='100    a1b2.ac10.7000    DYNAMIC     Gi0/1'>

In [70]: match.group()
Out[70]: '100    a1b2.ac10.7000    DYNAMIC     Gi0/1'

An example of compiling a regex and its use based on example of a log file (parse_log_compile.py file):

import re

regex = re.compile(r'Host \S+ '
                   r'in vlan (\d+) '
                   r'is flapping between port '
                   r'(\S+) and port (\S+)')

ports = set()

with open('log.txt') as f:
    for m in regex.finditer(f.read()):
        vlan = m.group(1)
        ports.add(m.group(2))
        ports.add(m.group(3))

print('Loop between ports {} in VLAN {}'.format(', '.join(ports), vlan))

This is a modified example of finditer usage. Description of regex changed:

regex = re.compile(r'Host \S+ '
                   r'in vlan (\d+) '
                   r'is flapping between port '
                   r'(\S+) and port (\S+)')

And now the call of finditer is executed as a regex object method:

for m in regex.finditer(f.read()):

Options that are available only when using re.compile#

When using re.compile in search, match, findall, finditer and fullmatch methods, additional parameters appear:

  • pos - allows you to specify an index in string from where to start looking for a match

  • endpos - specifies from which index the search should be started

Their use is similar to execution of a string slice.

For example, this is the result without specifying pos, endpos parameters:

In [75]: regex = re.compile(r'\d+ +\S+ +\w+ +\S+')

In [76]: line = ' 100    a1b2.ac10.7000    DYNAMIC     Gi0/1'

In [77]: match = regex.search(line)

In [78]: match.group()
Out[78]: '100    a1b2.ac10.7000    DYNAMIC     Gi0/1'

In this case, the initial search position should be indicated:

In [79]: match = regex.search(line, 2)

In [80]: match.group()
Out[80]: '00    a1b2.ac10.7000    DYNAMIC     Gi0/1'

The initial entry is the same as string slice:

In [81]: match = regex.search(line[2:])

In [82]: match.group()
Out[82]: '00    a1b2.ac10.7000    DYNAMIC     Gi0/1'

A final example is the use of two indexes:

In [90]: line = ' 100    a1b2.ac10.7000    DYNAMIC     Gi0/1'

In [91]: regex = re.compile(r'\d+ +\S+ +\w+ +\S+')

In [92]: match = regex.search(line, 2, 40)

In [93]: match.group()
Out[93]: '00    a1b2.ac10.7000    DYNAMIC     Gi'

And a similar string slice:

In [94]: match = regex.search(line[2:40])

In [95]: match.group()
Out[95]: '00    a1b2.ac10.7000    DYNAMIC     Gi'

In match, findall, finditer and fullmatch methods pos and endpos parameters work similarly.