119

Currently I'm using this:

f = open(filename, 'r+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.write(text)
f.close()

But the problem is that the old file is larger than the new file. So I end up with a new file that has a part of the old file on the end of it.

compie
  • 9,449
  • 15
  • 51
  • 73

6 Answers6

194

If you don't want to close and reopen the file, to avoid race conditions, you could truncate it:

f = open(filename, 'r+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.write(text)
f.truncate()
f.close()

The functionality will likely also be cleaner and safer using open as a context manager, which will close the file handler, even if an error occurs!

with open(filename, 'r+') as f:
    text = f.read()
    text = re.sub('foobar', 'bar', text)
    f.seek(0)
    f.write(text)
    f.truncate()
ti7
  • 7,285
  • 3
  • 25
  • 49
nosklo
  • 193,422
  • 54
  • 273
  • 281
  • 1
    Just to be clear in my mind - should your second clip have `f.write(text)` after `f.truncate()`? – volvox Apr 09 '19 at 10:04
  • 4
    @volvox `f.write(text)` is before `f.truncate()` in this code; it writes the `text` first, so after `.write()` the file cursor is positioned at the end of `text`. Proceeding to truncate the file will remove whatever remaining bytes the file might have after this point. In this case, the end result would be the same as if you truncated before writing. – nosklo Apr 10 '19 at 13:02
  • 1
    For very large files, reading the entire file contents into memory can become unwieldy. Therefore, the [`fileinput` module](https://stackoverflow.com/a/32669061/645491) can become the preferred method. When passed `inplace=1`, it will move the file to a temporary location first, then write a new file to the old filename path. This move operation is fast on unix filesystems, because it just moves the filesystem `inode`, not the full contents. Then you can read & process each line individually to avoid the memory bloat. :-) – TrinitronX Dec 27 '19 at 23:48
19

The fileinput module has an inplace mode for writing changes to the file you are processing without using temporary files etc. The module nicely encapsulates the common operation of looping over the lines in a list of files, via an object which transparently keeps track of the file name, line number etc if you should want to inspect them inside the loop.

from fileinput import FileInput
for line in FileInput("file", inplace=1):
    line = line.replace("foobar", "bar")
    print(line)
codeMonkey
  • 432
  • 7
  • 16
ghostdog74
  • 286,686
  • 52
  • 238
  • 332
16

Probably it would be easier and neater to close the file after text = re.sub('foobar', 'bar', text), re-open it for writing (thus clearing old contents), and write your updated text to it.

Il-Bhima
  • 10,298
  • 1
  • 42
  • 50
1

I find it easier to remember to just read it and then write it.

For example:

with open('file') as f:
    data = f.read()
with open('file', 'w') as f:
    f.write('hello')
0

Honestly you can take a look at this class that I built which does basic file operations. The write method overwrites and append keeps old data.

class IO:
    def read(self, filename):
        toRead = open(filename, "rb")

        out = toRead.read()
        toRead.close()
        
        return out
    
    def write(self, filename, data):
        toWrite = open(filename, "wb")

        out = toWrite.write(data)
        toWrite.close()

    def append(self, filename, data):
        append = self.read(filename)
        self.write(filename, append+data)
        
CodinGuy
  • 35
  • 1
  • 5
-2

Try writing it in a new file..

f = open(filename, 'r+')
f2= open(filename2,'a+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.close()
f2.write(text)
fw.close()
sk7979
  • 130
  • 2
  • 15