2

I am trying to convert a Big size list to dataframe. Length of the list len(rows_list) is 15347782 which is pretty big. Worked well with lesser size of list using this :

df = pd.DataFrame(rows_list)

But this breaks because of memory error when I try to convert this size of list into dataframe.

Is there any way of implementing chunksize while writing it into dataframe? Like we do while writing big size file to csv or reading a big size file from csv.

Or there is any other smooth way for this task?

Thanks in advance!

  • 2 questions: (1) Are you tied to `pandas`? [in general, `pandas` is best used for in-memory manipulation], (2) what are the contest of the `list` [if strings, are there many *repeated* strings?] ? – jpp Apr 11 '18 at 08:32
  • Hi @jpp thanks for your reply. will it not be possible in pandas because of this size? and my `list` consists of numbers strings everything –  Apr 11 '18 at 08:46
  • `pandas` has a significant overhead relative to native Python types. So, yes, it is likely size-linked. In addition, you are holding 2 copies of data when you build the dataframe (the list itself, and the `numpy` representation in the dataframe). That's double the memory requirement. If you need to do this efficiently, and this is your only task, don't use `pandas`. – jpp Apr 11 '18 at 08:54
  • okay, are you aware of any other alternatives? –  Apr 11 '18 at 08:56
  • I need to put my list to dataframe and from dataframe to csv file –  Apr 11 '18 at 08:57
  • Yup, check this out: [Writing a Python list of lists to a csv file](https://stackoverflow.com/questions/14037540/writing-a-python-list-of-lists-to-a-csv-file) – jpp Apr 11 '18 at 08:57
  • Explain *why* you need to use a dataframe. – jpp Apr 11 '18 at 08:58
  • 1
    @jpp- your link worked beautifully for me. thanks :) –  Apr 11 '18 at 09:10

0 Answers0