In my case, I have two csv file (file1 and file2).
To simplify my question, let's say that I want to read elements of file1, 3 by 3 and file2 4 by 4 consecutively.
file1.csv (9 line)
1,2,3
3,5,8
7,2,9
10,111,12
13,14,155
31,2,3
3,15,82
8,4,91
12,111,13
file2.csv (12 line)
55,12,17
3,6,13
72,1,91
10,0,12
1,1,73
31,2,3
3,15,61
18,6,91
13,33,13
7,1,15
9,17,42
41,8,18
in output i want to get:
1,2,3 (from 1. row of file1.csv)
3,5,8 (from 2. row of file1.csv)
7,2,9 (from 3. row of file1.csv)
55,12,17 (from 1. row of file2.csv)
3,6,13 (from 2. row of file2.csv)
72,1,91 (from 3. row of file2.csv)
10,0,12 (from 4. row of file2.csv)
10,111,12 (from 4. row of file1.csv)
13,14,155 (from 5. row of file1.csv)
31,2,3 (from 6. row of file1.csv)
1,1,73 (from 5. row of file2.csv)
31,2,3 (from 6. row of file2.csv)
3,15,61 (from 7. row of file2.csv)
18,6,91 (from 8. row of file2.csv)
3,15,82 (from 7. row of file1.csv)
8,4,91 (from 8. row of file1.csv)
12,111,13 (from 9. row of file1.csv)
13,33,13 (from 9. row of file2.csv)
7,1,15 (from 10. row of file2.csv)
9,17,42 (from 11. row of file2.csv)
41,8,18 (from 12. row of file2.csv)
My real data files are very big (~1,6 GB each of them) and I want to use less memory as much as possible. For this, I wrote a script:
f1, f2, = open(pathInput1, 'r'), open(pathInput2, 'r')
position1, position2 = 0, 0
for i in range(6):
if i%2 == 0:
#print("file1.csv")
sizeOfWindow = 3
sizeOfWindowInactive = 4
f1.seek(position1)
data = []
for l in range(sizeOfWindow):
line = f1.readline()
line = list(map(int, line[:-1].split(",")))
data.append(line)
data = np.array(data)
print(data)
[next(f2) for i in range(sizeOfWindowInactive)]
position1 = f1.tell()
else:
#print("file2.csv")
sizeOfWindow = 4
sizeOfWindowInactive = 3
f2.seek(position2)
data = []
for l in range(sizeOfWindow):
line = f2.readline()
line = list(map(int, line[:-1].split(",")))
data.append(line)
data = np.array(data)
print(data)
[next(f1) for i in range(sizeOfWindowInactive)]
position2 = f2.tell()
After writing this script, I noticed that I can't use both readline()
and next()
. Now my question is, how can I arrange my script to observe same output without using much memory.
Edit: In my real case, I have 5 files and each file has its own sizeOfWindow. Depending on data that I read, I decide to jump into files with an if statement. So The sizeOfWindow is fixed depending on files. I don't read files regularly. I decide the file to jump using last data part that I read.When I read a file, I need to move the cursor of other files without reading their data.