-1

For example, this is a sample and simplified text file:

word1 word2 word3 word1 word1 word2

word1 word2 word3 word4

I want a regex that could remove anything after the second word, in this instance, word2. And if possible to remove the lines with empty space too. Would something of that sort be possible?

2 Answers2

0

Use Python and the re module:

import re

text = """
word1 word2 word3 word1 word1 word2

word1 word2 word3 word4
"""

regex = re.compile(r"\w+\s\w+\s(.*)")
res = re.findall(regex, text)
print(res)

Returns:

['word3 word1 word1 word2', 'word3 word4']

Save to a new file:

with open("processed.txt", "w") as wf:
    for r in res:
        wf.write(r)
Gustav Rasmussen
  • 2,802
  • 3
  • 12
  • 33
  • Looks good, I think it might be a bit sluggish for me because there'd be quite a lot of lines so I'll play around with making it import and export the file without printing the entire results. Thanks for the help! – Joseph Newell Jul 13 '20 at 14:40
  • Yes, the print step is not necessary. Just compiling the regex, calling findall on your text/file-import, and write the result to a new file is enough – Gustav Rasmussen Jul 13 '20 at 15:24
0

Note: make sure to make a copy of the file before doing replacements

regex: (^\b\w+\b\s\w+)(.*)|^\n

substitution: \1

Vishal Singh
  • 5,236
  • 2
  • 15
  • 27