regex - Python 2 and 3 're.sub' inconsistency -


i writing function split numbers , other things text in python. code looks this:

en_extract_regex = '([a-za-z]+)' num_extract_regex = '([0-9]+)' aggr_regex = en_extract_regex + '|' + num_extract_regex  entry = re.sub(aggr_regex, r' \1\2', entry) 

now, code works fine in python3, not work under python2 , "unmatched group" error.

the problem is, need support both versions, , not work in python2 although tried various other ways.

i curious root of problem, , there workaround it?

i think problem might regex pattern matches 1 or other of subpatterns en_extract_regex , num_extract_regex, not both.

when re.sub() matches alpha characters in first pattern attempts substitute second group reference \2 fails because first group matched - there no second group.

similarly when digit pattern matched there no \1 group substitute , fails.

you can see case test in python 2:

>>> re.sub(aggr_regex, r' \1', 'abcd')    # reference first pattern  abcd >>> re.sub(aggr_regex, r' \2', 'abcd')    # reference second pattern traceback (most recent call last): .... sre_constants.error: unmatched group 

the difference must lie within different versions of regex engine python 2 , python 3. unfortunately can not provide definitive reason difference, however, there documented change in version 3.5 re.sub() regarding unmatched groups:

changed in version 3.5: unmatched groups replaced empty string.

which explains why works in python >= 3.5 not in earlier versions: unmatched groups ignored.


as workaround can change pattern handle both matches single group:

import re  en_extract_regex = '[a-za-z]+' num_extract_regex = '[0-9]+' aggr_regex = '(' + en_extract_regex + '|' + num_extract_regex + ')' # ([a-za-z]+|[0-9]+)  s in '', '1234', 'abcd', 'a1b2c3', 'aa__bb__1122cdef', '_**_':     print(re.sub(aggr_regex, r' \1', s)) 

output

   1234  abcd  1 b 2 c 3  aa__ bb__ 1122 cdef _**_ 

Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -