1
df <- data.frame(category = c("A","B","A","D","E"),
                 date = c("5/10/2005","6/10/2005","7/10/2005","8/10/2005","9/10/2005"),
                 col1 = c(1,NA,2,NA,3),
                 col2 = c(1,2,NA,4,5),
                 col3 = c(2,3,NA,NA,4))

I have to insert a data frame that is created in R to mysql server.

I have tried these methods(Efficient way to insert data frame from R to SQL). However, my data also has NA which are fails the whole process of exporting.

Is there a way around to faster upload to data.

dbWriteTable(cn,name ="table_name",value = df,overwrite=TRUE, row.names = FALSE)

The above works but is very slow to upload

The method that I have to use is this :

before = Sys.time()
chunksize = 1000000 # arbitrary chunk size
for (i in 1:ceiling(nrow(df)/chunksize)) {
  query = paste0('INSERT INTO dashboard_file_new_rohan_testing (',paste0(colnames(df),collapse = ','),') VALUES ')
  vals = NULL
  for (j in 1:chunksize) {
    k = (i-1)*chunksize+j
    if (k <= nrow(df)) {
      vals[j] = paste0('(', paste0(df[k,],collapse = ','), ')')
    }
  }
  query = paste0(query, paste0(vals,collapse=','))
  dbExecute(cn, query)
}
time_chunked = Sys.time() - before

Error Encountered:

Error in .local(conn, statement, ...) : 
  could not run statement: Unknown column 'NA' in 'field list'
Let's Code
  • 99
  • 7

1 Answers1

1

One of the fastest ways to load data into MySQL is to use its LOAD DATA command line tool. You may try first writing your R data frame to a CSV file, then using MySQL's LOAD DATA to load it:

write.csv(df, "output.csv", row.names=FALSE)

Then from your command line, use:

LOAD DATA INFILE 'output.csv' INTO TABLE table_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;

Note that this assumes the CSV file is already on the same machine as MySQL. If not, and you have it still locally, then use LOAD DATA LOCAL INFILE instead.

You may read MYSQL import data from csv using LOAD DATA INFILE for more help using LOAD DATA.

Edit:

To deal with the issue of NA values, which should represent NULL in MySQL, you may take the approach of first casting the entire data frame to text, and then replacing the NA values with empty string. LOAD DATA will interpret a missing value in a CSV column as being NULL. Consider this:

df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
df[is.na(df)] <- ""

Then, use write.csv along with LOAD DATA as described above.

Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
  • The above method is efficient. However, this is not able to solve the NULL handling issue. "https://stackoverflow.com/questions/2675323/mysql-load-null-values-from-csv-data". I need to explicitly mention the columns which have may have NULL values. – Let's Code Jun 01 '20 at 07:43
  • @Let'sCode Check the updated answer for one way to workaround your problem with `NA` values in R. – Tim Biegeleisen Jun 01 '20 at 07:49