6

When appending to csv, my first line is starting on the existing last line rather than a new line.

I keep searching SO, but I am just finding the basic use of opening a csv in append mode or using append mode when writing to csv. I could not make sense of the accepted answer here (to_csv append mode is not appending to next new line) since it appears to require the existing file to be open before writing the ("/n") with f.write("/n"). This answer (How to add pandas data to an existing csv file?) is most relevant, but I am hoping to write multiple data frames in a function, so I do not want to keep opening them. My plan is to use a function like:

import os
def mysave(df,dfpath):
    # if file does not exist write header 
    if not os.path.isfile(dfpath):
        df.to_csv(dfpath, index = False)
    else: # else it exists so append without writing the header
        df.to_csv(dfpath, mode = 'a', index = False, header = False)

mysave(mydf, 'foo.csv')

I've created a very simple example, with foo.csv with the structure:

a   b   c   d           
5   1   ah  doo         
6   2   bah poo         
7   2   dah coo

When I use my function or this simple code:

import pandas as pd
df = pd.read_csv('foo.csv', index_col=False)
mydf = df
mydf.to_csv('foo.csv', mode='a', index = False, header = False)

This is what foo.csv ends up as:

a   b   c   d           
5   1   ah  doo         
6   2   bah poo         
7   2   dah coo5    1   ah  doo
6   2   bah poo         
7   2   dah coo     

When I attempt to add a carriage return character as the header, like mydf.to_csv('foo.csv', mode='a', index = False, header = ("/n")) pandas (rightly) ignores my erroneous header comment and goes with the default of header = True.

a   b   c   d           
5   1   ah  doo         
6   2   bah poo         
7   2   dah cooa    b   c   d
6   2   bah poo         
7   2   dah coo 
Community
  • 1
  • 1
jessi
  • 1,248
  • 17
  • 32

3 Answers3

4

I had a similar problem and after a god bit of searching, I didn't find any simple/elegant solution. The minimal fix that worked for me is:

import pandas as pd

with open('foo.csv') as f:
    f.write('\n')    
mydf.to_csv('foo.csv', index = False, header = False, mode='a')
mathamateur
  • 128
  • 1
  • 8
1

I am assuming that you are going to appending one below other of two dataframe into single dataframe.

use below mentioned command to make it as single command

ans = pd.concat([df, df])

then you can make output into .csv file

sriramkumar
  • 134
  • 3
  • 14
  • This recommendation doesn't answer my question, but it does address my underlying real problem. I currently do what you suggest. I build a dataframe (many dataframes by group), iteratively and then I send it all to a .csv (by group). `for mygroup in mygroups: d = pd.DataFrame() for k, v in mygroup: if k == x: #do something temp = makeMyDF(v) d = pd.concat([d, temp]) d.to_csv(filepath)` However, my current process takes 11 hours. I was hoping to try to not hold as much in memory. – jessi Nov 11 '16 at 09:33
  • if your data frame is large ,then writing in disk will have larger window time. during those "disk writing window time", you can do other processing , by using multiprocessing(threading). but it depends on hardware , processor . – sriramkumar Nov 11 '16 at 11:08
1

If your dataframe gets huge and you want to avoid concatenation you could go with

import csv
with open('foo.csv','ab') as out:
   writer=csv.writer(out)
   writer.writerow(())

in a function or just as a snippet in your code. If you're not on Windows maybe you could avoid adding 'b' in open and open the file with just 'a' (append)

themistoklik
  • 830
  • 1
  • 7
  • 17
  • So, this recommendation doesn't answer my question but is an alternative for my actual problem. I think i could re-conceptualize my approach to do what you suggest if I open the .csv after I first make it (in a loop) and then send the temp data frame to the open csv. However, will it be faster than just building the dataframe in python and sending the "whole thing" to csv once? – jessi Nov 11 '16 at 09:29
  • If you're going to be dealing with small dataframes this IMO isn't worth the time to refactor or even test, however if you find yourself in a situation where the concatenation process is expensive then I suppose that would be faster. If you find the time to test it let me know. – themistoklik Nov 11 '16 at 09:49
  • I don't really have small dataframes. I plan to test after I figure out how to do this append. without sliding the first line of the dataframe over. Currently, my process takes 11 hours for 25 groups. Granted, most of this time is in xlwings as I iteratively change inputs to an Excel Tool (not my work), but I think some of it is in holding the dataframes in memory. – jessi Nov 11 '16 at 09:55
  • For a manageable number of dataframes another way you could go about this without refactoring much is write each dataframe to a csv like you do now and when you're done concat them all, so your problem becomes a much simpler one. Hacky, but if all else fails.. – themistoklik Nov 11 '16 at 10:16