Greedy qualifiers#
By default, *
, +
, and ?
qualifiers are all greedy - they match
as much text as possible.
An example of greedy behavior:
In [1]: import re
In [2]: line = '<text line> some text>'
In [3]: match = re.search('<.*>', line)
In [4]: match.group()
Out[4]: '<text line> some text>'
In this case, expression captured maximum possible piece of symbols contained
in <>
. If greedy behavior need to be disabled, just add a question mark
after the repetition symbols:
In [5]: line = '<text line> some text>'
In [6]: match = re.search('<.*?>', line)
In [7]: match.group()
Out[7]: '<text line>'
But greed is often useful. For example, without turning off greed of the last
plus, expression \d+\s+\S+
describes line:
In [8]: line = '1500 aab1.a1a1.a5d3 FastEthernet0/1'
In [9]: re.search('\d+\s+\S+', line).group()
Out[9]: '1500 aab1.a1a1.a5d3'
Symbol \S
denotes everything except whitespace characters. Therefore,
expression \S+
with greedy repetition symbol describes maximum long
string until the first whitespace character. In this case up to the first space.
If greed is disabled, the result is:
In [10]: re.search('\d+\s+\S+?', line).group()
Out[10]: '1500 a'