This is a follow up on the answer provided here on using sqldf()
https://stackoverflow.com/a/1820610
In my particular case, I have a tab-delimited file with over 110 million rows. I'd like to select the rows that match 4.6 million tag IDs.
In the following code, the tag IDs are in tag.query
However, while the example will work with a smaller query, it does not handle the above larger example:
sql.query <- paste('select * from f where v2 in (', tag.query, ')', sep='')
selected.df <- sqldf(sql.query, dbname = tempfile(), file.format = list(header = F, row.names = F, sep="\t", skip=line.where.header.is))
Any suggestions on alternative appraoches?