1

I have a dataframe as follows with 4 character columns

df <- data.frame(2016=c("light", "", "", "", ""), 2017=c("radio", "", "", "", ""), after2017=c("", "Utility grid connection for lighting", "", "", "light"), dkcs=c("", "", "TV", "TV", ""))

I want to create a 5th column, "db" such that it has a value 0 if either all 4 columns are empty for that row or the value of the column "contains" the string "Utility grid", otherwise value of "db" is 1.

I wrote the following code which runs but it is giving all values of db as 1 irrespective of whether it should be 0. The code works correctly if I remove the 'or' condition inside the 'if' condition. What do you think is wrong? Also is the way I am using "contains" correct? I appreciate your help!

for(i in 1:nrow(df)) {

  if(df$2016[i]!= "" | df$2016H2[i]!= "Utility grid.") {
    df$db[i] <- 1
  } else if (df$2017[i]!="" | df$2017[i]!="Utility grid.") {
    df$db[i] <- 1
  } else if (df$after2017[i]!="" | df$after2017[i]!="Utility grid.") {
    df$db[i] <- 1
  } else if (df$dkcs[i]!="" | df$dkcs[i]!="Utility grid.") {
    df$db[i] <- 1
  }
  else df$db[i] <- 0
}
user3816784
  • 51
  • 1
  • 10
  • Your data does not have a variable `2016H2` - perhaps that is where it is going wrong. – Andrew Gustar May 28 '17 at 16:50
  • Side note: I dunno what goes wrong in your loop. However, you should make sure that your example `df` can be copy-pasted (the way you specified it, it yields an error) and that it includes your specific cases (e.g. "all fields are empty" is missing if I see that right) – lukeA May 28 '17 at 16:59
  • Can you please add the expected result ? – agstudy May 28 '17 at 17:02
  • Thank you for your comments. Agstudy, the desired result is db – user3816784 May 29 '17 at 12:42
  • I should have clarified before - My dataframe actually has 41 columns and 600 rows. I only pasted a sub-set of columns relevant for my question here for simplicity. The 4 columns in question are 28:31 in terms of order of columns in my dataframe. – user3816784 May 29 '17 at 12:43

1 Answers1

1

Here's an alternative approach:

df <- data.frame(
  `2016`=c("light", "", "", "", "", ""), 
  `2017`=c("radio", "", "", "", "", ""), 
  after2017=c("", "Utility grid connection for lighting", "", "", "light", ""), 
  dkcs=c("", "", "TV", "TV", "", ""), 
  check.names=F)
df$db <- (!grepl("Utility grid|^$", apply(df, 1, paste, collapse="")))+0L
df
#    2016  2017                            after2017 dkcs db
# 1 light radio                                            1
# 2             Utility grid connection for lighting       0
# 3                                                    TV  1
# 4                                                    TV  1
# 5                                            light       1
# 6                                                        0
lukeA
  • 48,497
  • 5
  • 73
  • 84
  • Thank you! This is an elegant solution and achieves the desired result. I was unfamiliar with some of these functions but I have been slowly decoding it and understand the code better now. I still have the following questions – user3816784 May 29 '17 at 12:32
  • 1. I understood the use of grepl, how are you using apply to achieve the value 0 or 1. I understood that through argument '1' inside 'apply' you are going through each of the columns, won't paste just string all values of each column together? How are you still getting 0 and 1? – user3816784 May 29 '17 at 12:39
  • 2. My dataframe actually has 41 columns and 600 rows. I only pasted a sub-set of columns relevant for my question here for simplicity. The 4 columns in question are 28:31 in terms of order of columns in my dataframe. Can the apply be run on these selected columns? I should have clarified actual structure of my dataframe in my original question - apologies! – user3816784 May 29 '17 at 12:39
  • @user3816784 The `apply` call is just one way of [concatenating rows of a data frame](https://stackoverflow.com/questions/13944078/concatenating-rows-of-a-data-frame). `1` means apply row by row (2 would e.g. be column by column). `grepl` gives you a logical vector (thats what the l stands for). I added `+0L` to convert logical to integer, because you did not want true/false but 1/0. – lukeA May 29 '17 at 12:44
  • @user3816784 `(!grepl("Utility grid|^$", apply(df[,28:31], 1, paste, collapse="")))+0L` would run it on the subset of columns 28 to 31. – lukeA May 29 '17 at 12:45
  • Thanks! I have still not understood (still a beginner!) grepl is only checking for presence of utility grid? How is is also checking if all columns in a particular row are also empty? I understand ^ matches string at beginning of string, and so Utility grid is being matches in the beginning? but what about the second 'or' condition? Can you please break down the grepl for me? – user3816784 May 29 '17 at 13:01
  • @user3816784 yep, `^` is beginning of string and `$` is end of string. So it reads like _return TRUE if string matches Utility grid or end of string is directly followed by beginning of string. That was the first regex that came to my mind - maybe there's a better option. :) – lukeA May 29 '17 at 13:03