I have a tool (exe provided to me), which outputs poorly formatted csv's. They are bad in that the last value can have commas, with no quotes, e.g.:
184500,OBJECT_CALENDAR,,,UNITS_NO_UNITS,NULL,,,,Sched N&S B1,1st,3rd,4S,5th&6th
Where the last string actually begins at 'Sched', so I would expect to see something like this:
184500,OBJECT_CALENDAR,,,UNITS_NO_UNITS,NULL,,,,"Sched N&S B1,1st,3rd,4S,5th&6th"
This is screwing up everything I am trying to do, and I am curious how to address it. Is there a way to define the number of columns in read.csv?
I have tried to read it line by line, but it is slow, and less than elegant:
processFile = function(filepath) {
i = 1
vector = character(0)
theFile = file(filepath, "r")
while ( TRUE ) {
line = readLines(theFile, n = 1)
if ( length(line) == 0 ) {
break
} else {
vector[i] <- line
i = i+1
}
}
close(theFile)
formatted <- lapply(strsplit(vector[-1],','), function(x) {c(x[1:9], paste(x[10:length(x)], collapse = ','))})
finalFrame <- as.data.frame(matrix(unlist(formatted),ncol = 10, byrow = TRUE))
return(finalFrame)
}
Any better ways to do this? Any base functions that can do this, and if not, any libraries?