I want to read a file, 4 lines by 4 (it's a fastq file, with DNA sequences).
When I read the file one line by one or two by two, there's no issues, but when I read 3 or 4 lines at once, my code crashes (kernel appeared to have died on jupyter notebook). (Uncommenting the last part, or any 3 out of the 4 getline()
.
I tried with a double array of char (char**) to store the lines, with the same issue.
Any idea what can be the cause ?
Using Python 3.7.3, Cython 0.29, all other libraries updated. File being read is about 1.3GB, machine has 8GB, ubuntu 16.04. Code adapted from https://gist.github.com/pydemo/0b85bd5d1c017f6873422e02aeb9618a
%%cython
from libc.stdio cimport FILE, fopen, fclose, getline
def fastq_reader(early_stop=10):
cdef const char* fname = b'/path/to/file'
cdef FILE* cfile
cfile = fopen(fname, "rb")
cdef:
char * line_0 = NULL
char * line_1 = NULL
char * line_2 = NULL
char * line_3 = NULL
size_t seed = 0
ssize_t length_line
unsigned long long line_nb = 0
while True:
length_line = getline(&line_0, &seed, cfile)
if length_line < 0: break
length_line = getline(&line_1, &seed, cfile)
if length_line < 0: break
# length_line = getline(&line_2, &seed, cfile)
# if length_line < 0: break
# length_line = getline(&line_3, &seed, cfile)
# if length_line < 0: break
line_nb += 4
if line_nb > early_stop:
break
fclose(cfile)
return line_nb
fastq_reader(early_stop=20000)