6

I would like to use fread from data.table, but get a warning related to the decimal point [here a ',' instead of a '.']. Normally I use '.', but in some cases the file I have to import files with ',' as decimal point.

In read.csv I can set the decimal point separator:

df <- read.csv("mydata.csv", sep=";", dec=",")

How can I do this in the fread function in data.table? with

df=fread('mydata.csv',sep=';')

I get a warning message:

Warning message:
In fread("mydata.csv",  :
Bumped column 7 to type character on data row 86, field contains '4,5'. 

, where 4,5 is the value the would have been read in correctly as '4.5' with sep=',' in read.csv.

sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C  
Simon O'Hanlon
  • 54,383
  • 9
  • 127
  • 173
Henk
  • 3,428
  • 4
  • 26
  • 51
  • What OS are you on? [**See here**](http://stackoverflow.com/a/14476078/1478381) for a work-around. – Simon O'Hanlon Nov 13 '13 at 16:04
  • @SimonO101 - I am on Linux Debian 64 bit. I have to import a mix of ',' and '.' decimal point files, so changing the locale wouldn't help. – Henk Nov 13 '13 at 16:26
  • There is no reason you can't change it between reads. Once the data is in, R treats it the same (i.e. with a `.` unless you specify something different in `options(OutDec)`. – Simon O'Hanlon Nov 13 '13 at 16:29
  • Merci. Since the issue is with a single file, i will just do a search/replace then. – Henk Nov 13 '13 at 16:33

1 Answers1

6

Update Oct 2014 : Now in v1.9.5

fread now accepts dec=',' (and other non-'.' decimal separators), #917. A new paragraph has been added to ?fread. If you are located in a country that uses dec=',' then it should just work. If not, you will need to read the paragraph for an extra step. In case it somehow breaks dec='.', this new feature can be turned off with options(datatable.fread.dec.experiment=FALSE).



Previous answer ...

Since you're on Linux, using data.table 1.8.11 you can do the following:

fread("sed 's/,/./g' yourfile", sep = ";")

(actually I don't think you even need to specify sep here)

Matt Dowle
  • 56,107
  • 20
  • 160
  • 217
eddi
  • 47,367
  • 6
  • 94
  • 148
  • +1 I was just about to paste this (it took me a while to figure out!). Don't you need to `echo` the result though, e.g. `fread( "echo | sed 's/,/\\./g' C:/Data/mydata.csv" )` and since it's a system call it won't work out of the current working directory in R so you need to specify the path? – Simon O'Hanlon Nov 13 '13 at 17:14
  • @SimonO101 just checked and it works as advertised :) (at least on plain Linux - it sounds like you're on a Mac and I've never tested this on Macs) and path rules are the same as for a simple `fread` without the `sed`; that `echo` would also do nothing on Linux (again not sure about Macs) – eddi Nov 13 '13 at 17:30
  • Sadly on I'm Windows at work and using `GoW` (GNU on Windows) however I just checked and I don't need `echo` and in addition the path rules were being screwed up by a network user share which is mapped with a UNC path name - works just fine on a local drive! I should test more carefully! :-) – Simon O'Hanlon Nov 13 '13 at 17:35
  • @eddi- yes! linux rules, as usual. works with fread("sed 's/,/./g' 'yourfile'", sep = ";") [quotes around yourfile]. thanks! – Henk Nov 13 '13 at 17:47