Python Regex - Making an exception involving two text files -
i have 2 text files: text1 , text2.
text1:
(test1) (test2) (g) (test3) (test4) (test5) text2:
(test5) (testa) (testb) (testc) (testd) (teste) and have following code:
import re pattern = re.compile(r"(\((?!test2:|g}|test4)[\w+ :]+\))") open("text2.txt", "r") f: words = pattern.findall(f.read()) open("text1.txt", "r+") f: content = pattern.sub(lambda x: words.pop(0) if words else x.group(), f.read()) f.seek(0) f.write(content) f.truncate() what code is, using regular expression, changing words inside parenthesis in test1.txt 1 one, in order, words inside parenthesis in test2.txt, exceptions "test2", "test4" letter "g". however, want make exception: if, example, (test5) appears in both files, if it's in different lines, won't slected re, won't replaced; leaving text1.txt this:
(testa) (test2) (g) (testb) (test4) (test5) my question is: how should this? should change logic of program? or should change re?
leaving aside errors in pattern posted (i don't figure how works given example), regard strategy, can add exceptions list, , once iterated on both files, append new exceptions (occurrences in both files) , compile regex expression inserting of them (exceptions) joining '|' character.
this code works me .
import re exceptions=['test2','test4','g'] pattern1 = re.compile(r"(\((?!"+'|'.join(ex ex in exceptions)+")[\w+ :]+\))") open("text2.txt", "r") f: words = pattern1.findall(f.read()) print(words) open("text1.txt", "r+") f: text = f.read() line in text.splitlines(): if line in words: new_exception = re.search(r'\(([\w+ :]+)\)',line) exceptions.append(new_exception.group(1)) words.remove(line) all_exceptions_compiled = re.compile(r"(\((?!"+'|'.join(ex ex in exceptions)+")[\w+ :]+\))") content = all_exceptions_compiled.sub(lambda x: words.pop(0) if words else x.group(), text) f.seek(0) f.write(content) f.truncate() keep in mind have modified regex pattern by: (\((?!test2|g|test4)[\w+ :]+\))
this code implements iteration (on) , deletion (on) operations on list which, depending on size of n, can not efficient solution. should improve part if performance element consider.
Comments
Post a Comment