So i have a large CSV file with about 280 columns and 1 billion data having a file size of about 20GB. A sample of this file(with about 7 columns and 4 rows) is provided below:
SL No.,Roll No.,J_Date,F_Date,S1,S2,S3
1,00123456789,2004/09/11,2009/08/20,43,67,56
2,987654321,2010/04/01,2015/02/20,82,98,76
3,0123459876,2000/06/25,2005/10/02,72,84,02
4,000543216789,1990/08/29,1998/05/31,15,64,82
Now given the fact that the file is so large, i would have to read this file in smaller chunks at a time with me being able to specify the chunk size. But as u might have seen from the sample, "Roll No." has to be read as a "character" and not as a "numeric". Also i need to add the columns "S1","S2","S3" and write the sum to a new column "MM".
The output of the above sample has to be something like this:
SL No.,Roll No.,J_Date,F_Date,S1,S2,S3,MM
1,00123456789,2004/09/11,2009/08/20,43,67,56,166
2,987654321,2010/04/01,2015/02/20,82,98,76,256
3,0123459876,2000/06/25,2005/10/02,72,84,02,158
4,000543216789,1990/08,29,1998/05/31,15,64,82,161
I know similar questions has been asked before but i swear i couldn't get 1 answer that worked for me. I referred the following Quetions:
R:Loops to process large dataset(GBs) in chunks?
Trimming a huge (3.5 GB) csv file to read into R
How do i read only lines that fulfil a condition from a csv into R?
Read numeric input as string R and many more.
This might be a good time to say that i'm a total beginner when it comes to R, so all kinds of help would be very much appreciated. I've been sitting on this for a long while now.