7

I have code like this:

f1 = open('file1', 'a')
f2 = open('file1', 'a')

f1.write('Test line 1\n')
f2.write('Test line 2\n')
f1.write('Test line 3\n')
f2.write('Test line 4\n')

When this code is run with standard Python 2.7 interpreter, the file contains four lines as expected. However, when I run this code under PyPy, the file contains only two lines.

Could someone explain the differences between Python and PyPy in working with files in append mode?

UPDATED: The problem doesn't exist in the PyPy 2.3.

  • 2
    Why would you ever open the same file with two different handles? – wnnmaw Apr 18 '14 at 14:27
  • I have code like this in the old code, which hard to change. With standart Python this code works, but not with PyPy. – Alexey Stankevich Apr 18 '14 at 14:30
  • 1
    it is a matter of buffering and flushing the content of the file. on PyPy, the committing of the file is delayed, and therefore the last handle to commit to the file replaces its entire content. – njzk2 Apr 18 '14 at 14:41
  • 1
    [pypy bug](https://bugs.pypy.org/issue1739) – jfs Apr 21 '14 at 15:01

2 Answers2

3

The reason in different behavior is different implementation of file I/O operations.

CPython implements it's file I/O on top of fopen, fread and fwrite functions from stdio.h and is working with FILE * streams.

In the same time PyPy implements it's file I/O on top of POSIX open, write and read functions and is working with int file descriptors.

Compare these two programs in C:

#include <stdio.h>

int main() {
    FILE *a = fopen("file1", "a");
    FILE *b = fopen("file1", "a");

    fwrite("Test line 1\n", 12, 1, a);
    fflush(a);
    fwrite("Test line 2\n", 12, 1, b);
    fflush(b);
    fwrite("Test line 3\n", 12, 1, a);
    fflush(a);
    fwrite("Test line 4\n", 12, 1, b);

    fclose(a);
    fclose(b);

    return 0;
}

and

#include <fcntl.h>
#include <unistd.h>

int main() {
    int a = open("file1", O_CREAT | O_WRONLY | O_APPEND);
    int b = open("file1", O_CREAT | O_WRONLY | O_APPEND);

    write(a, "Test line 1\n", 12);
    write(b, "Test line 2\n", 12);
    write(a, "Test line 3\n", 12);
    write(b, "Test line 4\n", 12);

    close(a);
    close(b);

    return 0;
}

More info on difference between open and fopen you could find in answers to this question.

UPDATE:

After inspecting PyPy codebase some more, it seems to me it doesn't use O_APPEND flag by some reason, but O_WRONLY | O_CREAT for "a" mode. So it is the real reason in PyPy you need to seek to the end of file after each write call, as J.F. Sebastian mentioned in another answer. I guess a bug should be created at PyPy bugtracker, as O_APPEND flag is available both on Windows and Unix. So, what PyPy does now looks like:

#include <fcntl.h>
#include <unistd.h>

int main() {
    int a = open("file1", O_CREAT | O_WRONLY);
    int b = open("file1", O_CREAT | O_WRONLY);

    write(a, "Test line 1\n", 12);
    write(b, "Test line 2\n", 12);
    write(a, "Test line 3\n", 12);
    write(b, "Test line 4\n", 12);

    close(a);
    close(b);

    return 0;
}

Without O_APPEND flag it should reproduce PyPy behavior.

Community
  • 1
  • 1
maxbublis
  • 1,183
  • 8
  • 21
  • You have to wonder if PyPy is adhering to the spec or if the spec is not specific enough in this case. – wheaties Apr 18 '14 at 16:03
  • @wheaties I don't think that there is a such specification, it is implementation specific. CPython documents that it's [File Objects](https://docs.python.org/2/library/stdtypes.html#file-objects) are implemented on top of `stdio.h`. In the same time it has [io](https://docs.python.org/2/library/io.html) module [implemented](http://hg.python.org/cpython/file/c776ed8a8e75/Modules/_io/fileio.c) on top of POSIX functions. – maxbublis Apr 18 '14 at 16:24
  • Note: CPython 3 also implements I/O on top of POSIX `open`, `write`, `read`. It is accessable as `io` module in Python 2.7 – jfs Apr 18 '14 at 20:33
  • On my machine both your C programs, Python 2, Python 3, Jython produce the same output despite different I/O implementations. Only Pypy differs – jfs Apr 19 '14 at 01:06
  • @J.F.Sebastian There was an issue on my system, which lead to incorrect interpretation of running my C code. It seems to me, I've found now a real bug in implementation and added another C code example to reproduce it. – maxbublis Apr 19 '14 at 17:19
  • your C code is missing `lseek(fd, 0, 2)` after the `open()` if you want to make it like Pypy. As [I said in my answer](http://stackoverflow.com/a/23163051/4279) POSIX `O_APPEND` behaviour is not mandated by Python docs i.e., it is not clear whether the Pypy behaviour is a bug. – jfs Apr 19 '14 at 17:35
1

On POSIX systems:

O_APPEND
    If set, the file offset shall be set to the end of the file prior to each write.

It means that if a file is opened in "append" mode then when its buffer is flushed; the content shall go to the end of the file.

Python 2, Python 3, Jython respect that on my machine. In your case, the content is smaller than the file buffer therefore you see all writes from one file followed by all writes from another file in the result on the disk.

It is easier to understand if the files are line-buffered:

from __future__ import with_statement

filename = 'file1'
with open(filename, 'wb', 0) as file:
    pass # truncate the file

f1 = open(filename, 'a', 1)
f2 = open(filename, 'a', 1)

f1.write('f1 1\n')
f2.write('f2 aa\n')
f1.write('f1 222\n')
f2.write('f2 bbbb\n')
f1.write('f1 333\n')
f2.write('f2 cc\n')

Output

f1 1
f2 aa
f1 222
f2 bbbb
f1 333
f2 cc

Python documentation does not mandate such behaviour. It just mentions:

..'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position)emphasize is mine

Pypy produces the following output in unbuffered and line-buffered mode:

f2 aaff2 bbbf1f2 cc

Manually moving the file position to the end fixes pypy output:

from __future__ import with_statement
import os

filename = 'file1'
with open(filename, 'wb', 0) as file:
    pass # truncate the file

f1 = open(filename, 'a', 1)
f2 = open(filename, 'a', 1)

f1.write('f1 1\n')
f2.seek(0, os.SEEK_END)
f2.write('f2 aa\n')
f1.seek(0, os.SEEK_END)
f1.write('f1 222\n')
f2.seek(0, os.SEEK_END)
f2.write('f2 bbbb\n')
f1.seek(0, os.SEEK_END)
f1.write('f1 333\n')
f2.seek(0, os.SEEK_END)
f2.write('f2 cc\n')

If the file is fully-buffered then add .flush() after each .write().

It is probably not a good idea to write to the same file using more than one file object at once.

jfs
  • 346,887
  • 152
  • 868
  • 1,518