Questions tagged [sqldf]

SQLDF is an R package for running SQL statements on R data frames.

SQLDF is an R package for running SQL statements on R data frames.

The user simply specifies an SQL statement in R using data frame names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is executed, the result is read back into R and the database is deleted all automatically behind the scenes.

sqldf supports:

  • the SQLite backend database (by default),
  • the H2 java database,
  • the PostgreSQL database and
  • sqldf 0.4-0 onwards also supports MySQL.

Resources:

608 questions
88
votes
7 answers

Select the first row by group

From a dataframe like this test <- data.frame('id'= rep(1:5,2), 'string'= LETTERS[1:10]) test <- test[order(test$id), ] rownames(test) <- 1:10 > test id string 1 1 A 2 1 F 3 2 B 4 2 G 5 3 C 6 3 H …
dmvianna
  • 12,370
  • 17
  • 69
  • 101
54
votes
4 answers

Summarize all group values and a conditional subset in the same call

I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo …
kevinykuo
  • 4,050
  • 4
  • 20
  • 29
22
votes
3 answers

Best use of R and SQL if restricted to a local machine

I’m trying to improve my workflow, and am hoping the community can provide insights since I am relatively new to “big data”. I typically download several dataframes from public sources which can be related to one another. After pulling several…
Jayden.Cameron
  • 431
  • 3
  • 12
20
votes
4 answers

R: how to rbind two huge data-frames without running out of memory

I have two data-frames df1 and df2 that each have around 10 million rows and 4 columns. I read them into R using RODBC/sqlQuery with no problems, but when I try to rbind them, I get that most dreaded of R error messages: cannot allocate memory.…
Prasad Chalasani
  • 18,647
  • 5
  • 43
  • 71
15
votes
2 answers

Using sqldf and RPostgreSQL together

When using RPostgreSQL I find that I cannot use sqldf in the same way. For example if I load the library and read in data into a data frame using the following code: library(RPostgreSQL) drv <- dbDriver("PostgreSQL") con <- dbConnect(drv,…
djq
  • 13,348
  • 42
  • 111
  • 148
13
votes
2 answers

SQL-like functionality in R

I am used to writing data manipulation logic in SQL and now that I am learning R I find myself sometimes just wanting to do something that would be simple in SQL but I have to learn a bunch of stuff with R to do the same manipulation on an R data…
medriscoll
  • 24,637
  • 16
  • 36
  • 36
11
votes
3 answers

How can I pass R variable into sqldf?

I have some query like this: sqldf("select TenScore from data where State_P = 'AndhraPradesh'") But I have "AndhraPradesh" in a variable stateValue. How can I use this variable in a select query in R to get the same result as above. Please show me…
Sandeep
  • 565
  • 2
  • 6
  • 16
11
votes
4 answers

Error: Cannot pass NA to dbQuoteIdentifier() in sqldf package in R

Error: Cannot pass NA to dbQuoteIdentifier() In addition: Warning message: In field_types[] <- field_types[names(data)] : number of items to replace is not a multiple of replacement length This is the error message i am getting upon trying to…
9
votes
1 answer

sqldf: Changes timestamp from localtime to GMT/UTC

On my machine, sqldf changes a POSIXct from localtime(I'm at CST, GMT -0600) to GMT/UTC. Is this expected behavior? Can I stop R or sqldf from doing this? Here is my code: > library('sqldf') > > before <- data.frame(ct_sys = Sys.time()) >…
mpettis
  • 2,468
  • 4
  • 19
  • 29
9
votes
2 answers

Failed to connect the database when using sqldf in r

I loaded a csv file to my R, and when I Tried to use sqldf to select some column, it always went to Error in .local(drv, ...) : Failed to connect to database: Error: Access denied for user 'User'@'localhost' (using password: NO) Error in…
victordongy
  • 185
  • 1
  • 2
  • 11
8
votes
1 answer

Any way to automatically correct all variable classes in a dataframe

I have a dataframe with about ~250 variables. Unfortunately, all of these variables were imported as character classes from a sql database using sqldf. The problem: all of them should not be character classes. There are numeric variables, integers,…
jgozal
  • 1,241
  • 3
  • 14
  • 35
8
votes
1 answer

How to append several large data.table objects into a single data.table and export to csv quickly without running out of memory?

The easy answer to this is "buy more RAM" but I am hoping to get a more constructive answer and learn something in the process. I am running Windows 7 64-bit with 8GB of RAM. I have several very large .csv.gz files (~450MB uncompressed) with the…
Brian D
  • 2,188
  • 21
  • 37
7
votes
4 answers

Find nearest matches for each row and sum based on a condition

Consider the following data.table of events: library(data.table) breaks <- data.table(id = 1:8, Channel = c("NP1", "NP1", "NP2", "NP2", "NP3", "NP3", "AT4", "AT4"), Time = c(1000, 1100, 975, 1075, 1010,…
Dahbid
  • 73
  • 1
  • 5
7
votes
5 answers

Skip metadata when Importing dataset in R

My question involves how to skip metadata in the beginning of a file when importing data into R. My data is in .txt format where the first lines are metadata describing the data and these need to be filtered out. Below is a minimal example of the…
amo
  • 2,412
  • 3
  • 20
  • 33
7
votes
1 answer

How can I keep a date formatted in R using sqldf?

How do I rename a date field in SQLDF without changing the format? See my example below where my renamed date field "dt" converts the date to a number. How do I avoid this, or convert it back to a date? #Question for Stack Exchange df <- data.frame…
Chris L
  • 288
  • 1
  • 4
  • 15
1
2 3
40 41