I'm using python 3.8 64-bit with Pycharm on windows 8. I got a memory error when trying to make a dataframe from a list.
What I'm trying to do is to read a huge .csv (25gb) into a list using the csv
package, make a dataframe with it using pd.Dataframe
, and then export a .dta file with the pd.to_stata
function. My RAM is 64gb, way larger than the data.
Here is the error msg:
MemoryError: Unable to allocate 25.8 GiB for an array with shape (77058858, 45) and data type object
I found three similar questions but none of them work for me.
- question 1: the solution does not work for me because I am using 64-bit python.
- the answer in this question suggests that the memory error raise because the PC do not have enough memory, but I'm pretty sure I have enough ram for this data.
- In this one, the author get memory error trying to read a huge csv, and the solution is to read data by piece. I understand that I can do the same, but I wonder if there is a cleaner way to solve this problem
Here are my code:
import csv
import itertools
import pandas as pd
colname= ["id","attachmentPath",...(20 other column names),"eventid"]
reader = csv.reader(open(r'test.csv', encoding = "ISO-8859-1"), quotechar='"',delimiter=',', skipinitialspace=False, escapechar='\\')
# read full sample
records = []
for record in itertools.islice(reader,1,77058860): # 77058860 is the length of the csv
records.append(record)
df = pd.DataFrame(records(reader,1,77058860): columns=colname)
statapath = r'stata_output.dta'
df.to_stata(statapath, version=117, write_index=False)