6

I have dataset, which negative value is presented with a bracket around the number i.e. (10)==-10, it is in csv format, how can I process it so that R will interpret the (10) as -10? Thank you.

UPDATE I know I can work it out by replacing ( as -, remove ), and use as.numeric afterwards, but is there a more elegant way for this issue?

Ben Bolker
  • 173,430
  • 21
  • 312
  • 389
lokheart
  • 20,665
  • 32
  • 86
  • 161

2 Answers2

9

If you create an "as.acntngFmt" method for the accounting format, you can read (or perhaps re-read with a text connection using colClasses("acnt").

 setClass("acntngFmt")
 # [1] "acntngFmt"
 setAs("character", "acntngFmt",
    function(from) as.numeric( gsub("\\)", "", gsub("\\(", "-", from))))

  Input <- "A, B, C
  (1.76), 1%, 3.50€
  2.00, 2%, 4.77€
  3.000, 3% , €5.68"

   DF <- read.csv(textConnection(Input), header = TRUE,
     colClasses = c("acntngFmt", "character", "character"))
   str(DF)
'data.frame':   3 obs. of  3 variables:
 $ A: num  -1.76 2 3
 $ B: chr  "1%" "2%" "3%"
 $ C: chr  "3.50€" "4.77€" "€5.68"
IRTFM
  • 240,863
  • 19
  • 328
  • 451
  • 2
    You could gain speed up using `fixed=TRUE` mode of regular expression: `as.numeric(sub(")", "", sub("(", "-", from, fixed=TRUE), fixed=TRUE))`. – Marek May 10 '11 at 12:59
  • You are right and I think sub might be more appropriate than gsub as well since this would be applied element-wise. – IRTFM May 10 '11 at 13:19
  • 1
    Building on 42-'s answer, code to handle embedded commas in the accounting format: `setClass("acntngFmt") clean.acc – adts Jul 23 '19 at 23:15
0

If you know the surrounding parentheses will be the only ones in the unit, you can create a function to deal with them:

test <- c(10, "(10)", 5)
negative_paren <- function(vec){
  #the backspace escapes the special "(" character
  vec <- gsub("\\(","-",vec) 
  vec <- gsub("\\)","",vec)
  vec <- as.numeric(vec)
  return(vec)
}
negative_paren(test)
[1]  10 -10   5
Kyouma
  • 323
  • 3
  • 10