192

I have a persistent multiple warning of "unknown column" for all types of commands (e.g., str(x) to installing updates on packages), and not sure how to debug this or fix it.

The warning "unknown column" is clearly related to a variable in a tbl_df that I renamed, but the warning comes up in all kinds of commands seemingly unrelated to the tbl_df (e.g., installing updates on a package, str(x) where x is simply a character vector).

ssp3nc3r
  • 3,247
  • 2
  • 9
  • 22
  • 11
    I realize the question is vague, but so seems the problem. I can even type nonsense (e.g., typo) and receive the warnings. I'm guessing it is persistent in the IDE environment itself, somehow? – ssp3nc3r Aug 19 '16 at 14:23
  • 1
    Can you post the exact command and the output you get please? – konvas Aug 19 '16 at 14:24
  • Have you started in a clean session and the problem still persists? What makes it start happening? – aosmith Aug 19 '16 at 14:31
  • The below "adfad" was was never created, to show I can type anything and still get the same warning messages: > adfad Error: object 'adfad' not found In addition: Warning messages: 1: Unknown column 'FACEBOOK.1' 2: Unknown column 'FACEBOOK.1' 3: Unknown column 'FACEBOOK.1' 4: Unknown column 'FACEBOOK.1' 5: Unknown column 'FACEBOOK.1' 6: Unknown column 'FACEBOOK.1' – ssp3nc3r Aug 19 '16 at 14:32
  • Strange.. I assume you are using RStudio? Can you try cleaning the environment and see if this fixes it? Go to the Environment tab (can't remember which is the default pane for that, probably the top right one) and click on the broom icon. Make sure you have saved your work first because this will delete all objects in your environment. – konvas Aug 19 '16 at 14:51
  • 3
    Yes, latest version of RStudio. I cleaned the environment, restarted, and it begins occurring when I load in a TBL_DF object. I seem to have fixed the issue by converting it to as.data.frame, closing everything and then reloading the data frame. Going forward, I'd like to understand two things: how to avoid the problem using tbl_df and why the warnings seem to persist in the environment. – ssp3nc3r Aug 19 '16 at 15:08
  • If this is all based on a single object, maybe add that object to your question so others can test? – aosmith Aug 19 '16 at 15:23
  • 1
    I am getting the same error. Is `FACEBOOK.1` a column in one of your data.frames and do you call it with `df$FACEBOOK.1` somewhere in your R script? My humble guess is that this is an error in the `tibble` package introduced in v1.1: https://blog.rstudio.org/2016/07/05/tibble-1-1/. Do you have `tibble` explicitly loaded? – dpprdan Aug 22 '16 at 12:16
  • Yes, `FACEBOOK.1` was a column of a `tbl_df` that had renamed to something else. I'm guessing that the renaming only worked on the data frame part of the `tbl_df` but not the other classes within the `tbl_df`. As I wasn't sure how to fix it, I coerced it to a data frame and have avoided using `tbl_df` until I can understand what went wrong. – ssp3nc3r Aug 22 '16 at 14:25
  • 5
    This is happening to me as well. I can reproduce the pattern on multiple computers but the warning appears seemingly randomly after some commands, for example library(Hmisc) or making a dataframe with dplyr. The warnings refer to columns that I haven't made yet - I make them later on in my code. I've restarted R and Rstudio multiple times and running the code clean doesn't help. What IS this??? – Nova Sep 27 '16 at 16:06
  • Is it worth creating in issue for this in GH? https://github.com/hadley/dplyr/issues – Joshua Rosenberg Oct 17 '16 at 02:06
  • This is happening to me as well! If someone can generate a reproducible example, opening a git issue would probably be the best approach. – Nan Oct 20 '16 at 12:59
  • I've also been able to reproduce that if the warning shows up, and I rerun the offending line, the warning diminishes (for example, first I see the warning 4 times, then 1 time), and eventually stops showing up. – Nan Oct 20 '16 at 13:15
  • I am getting this error now too: ```warning message: Unknown column "fixed"``` I notice it the most with dplyr but it happens with base R commands as well, such as ```write.csv()```. I can run and re-run the command and sometimes it will work and sometimes it will fail with this message. I thought it might have been related to the latest R release (3.3.2) and having to reinstall the packages but that doesn't seem to be the issue. Any help would be appreciated. – yake84 Nov 02 '16 at 19:10
  • 1
    Today I updated R (to 3.3.2) and R Studio (to 1.0.136) at the same time. And since then I get these warnings as well. Previously I used R 2.2.5 and a version of R Studio that was up to date around the time R 2.2.5 was released (sorry for beeing unspecific here) – yoland Jan 11 '17 at 15:32
  • 1
    This is also happening to me, in the same way as @Nova, in that my warnings relate to columns I make later in my code. It's very frustrating as my analysis is quite long and now peppered with warnings about "unknown columns". My data object seems to have multiple classes: `grouped_df`, `tbl_df`, `tbl`, and `data.frame`. The warnings appear when I run `psych::describe(mydf)`. I am running R 3.3.2 and RStudio 1.0.44 – meenaparam Jan 26 '17 at 11:03

9 Answers9

67

This is an issue with the Diagnostics tool in RStudio (the tool that shows warnings and possible mistakes in your code). It was partially fixed at this commit in RStudio v1.1.103 or later by @kevin-ushey. That fix was partial, because the warnings still appeared (albeit with less frequency). This issue was reported with a reproducible example at https://github.com/rstudio/rstudio/issues/7372 and it was fixed on RStudio v1.4 pull request.

Update to the latest RStudio release to fix this issue. Alternatively, there are several workarounds available, choose the solution you prefer:

  • Disable the code diagnostics for all files in Preferences/Code/Diagnostics

  • Disable all diagnostics for a specific file:

    Add at the beginning of the opened file(s):

     # !diagnostics off
    

    Then save the files and the warnings should stop appearing.

  • Disable the diagnostics for the variables that cause the warning

    Add at the beginning of the opened file(s):

     # !diagnostics suppress=<comma-separated list of variables>
    

    Then save the files and the warnings should stop appearing.

The warnings appear because the diagnostics tool in RStudio parses the source code to detect errors and when it performs the diagnostic checks it accesses columns in your tibble that are not initialized, giving the Warning we see. The warnings do not appear because you run unrelated things, they appear when the RStudio diagnostics are executed (when a file is saved, then modified, when you run something...).

zeehio
  • 3,310
  • 1
  • 32
  • 41
  • 15
    good call. This worked for me in RStudio 0.99, uncheck 'show diagnostics for r' under Tools>Global Options>Code>Diagnostics – Chris Holbrook Mar 31 '17 at 16:14
  • 1
    Unchecking "show diagnostics for R" worked for me as well in RStudio v.1.0.xxx . Will try to re-enable that after 1.1 is released and I've upgraded. – Magnus Aug 02 '17 at 11:59
  • 1
    FWIW I still get the error (RStudio v 1.1.x) but the proposed solution does work – Phil Sep 04 '17 at 16:15
  • 1
    v. 1.0.153 here with R v. 3.4.1, and I still have the error. – reima Nov 01 '17 at 10:32
  • 8
    RStudio 1.1.383 with R 3.4.3, problem still exists. – MS Berends Dec 08 '17 at 11:16
  • I'll join the gang: RStudio 1.1.383 with R 3.4.3, problem still exists. – dca Dec 16 '17 at 15:16
  • 1
    Me too; problem still exists with RStudio 1.1.383 with R 3.4.3 – bdforbes Feb 04 '18 at 23:15
  • 4
    Problem still exists with version 1.1.423. unchecking 'show diagnostics' works great – Adrian Apr 25 '18 at 20:05
  • 1
    I have also seen the warning but with less frequency than before. I hope to be able to reproduce it so I can report it. Feel free to post a link to a reproducible example that triggers the warning and I'll try to submit it upstream. – zeehio Apr 25 '18 at 21:31
  • 2
    Still exists v1.1.456. Sigh. – geotheory Aug 20 '18 at 19:53
  • 3
    Sill exists RStudio v1.1.643 with R v3.5.1 on RStudio Server on Ubuntu. – RFelber Dec 25 '18 at 09:19
  • Still exists in RStudio v1.2.1335 with R v3.6.2. The solutions posted by @sabre and stok work, but not always. I guess it's a feature, and not a bug. – Dunois Jan 22 '20 at 20:21
  • 1
    Will unchecking 'diagnostics for R' remove all warnings? It worked for me, but did I now remove all the warnings? Because that is of course not what I want... And if yes, is there another way around? – Rosanne Jan 31 '20 at 22:42
  • 1
    Hi @Rosanne, I added a third workaround to my answer so the diagnostics ignore only the variables you specify in a list. – zeehio Apr 09 '20 at 11:11
  • Hi @zeehio, this is super helpful. Could you clarify that third workaround with an example? Should it look like this: list(var1, var2, var3) or simply var1, var2, var3, or df$var1, df$var2, df$var3 or are quotes are needed? thanks! – conflictcoder Apr 24 '20 at 19:34
  • I'm not at the computer right now, I guess it is something like `#!diagnostics suppress=df1,df2` where df1 and df2 are the data frames that create the false positives – zeehio Apr 25 '20 at 20:20
  • Is there a global option I can use inside a function to turn these off? I tried `options(dplyr.summarise.inform=F)` at the start of the function and I still get these warning messages. Why is the command apparently preceded by # when # creates a comment? Can someone give an example of how to implement within a function. At one point I'm using a couple of loops to create sub-dataframes and these can vary dramatically from 2 to n depending on the input data. Having to pre-construct the data frame names is going to be a problem in my situation. – Michelle Aug 10 '20 at 22:17
  • I can turn off the warnings in my RStudio, but that's not going to turn off the warnings for users of my package (when I finally finish my package!, I'm getting there). – Michelle Aug 10 '20 at 22:19
  • 1
    You and your users can choose to disable RStudio diagnostics globally (in the RStudio settings) or on each file (using `# !diagnostics off` ). This looks like a comment because to R that is a comment and it is ignored. RStudio is an environment to develop code in R. You write instructions (code) for R, not for RStudio. To give instructions to RStudio (something unusual to do) the RStudio developers found convenient to use the `#!` syntax, since R will ignore it (to R it is just a comment) and it is unlikely that you give random instructions to RStudio using `#!` in your R comments by chance. – zeehio Aug 11 '20 at 05:34
56

I have been encountering the same problem, and although I don't know why it occurs, I have been able to pin down when it occurs, and thus prevent it from happening.

The issue seems to be with adding in a new column, derived from indexing, in a base R data frame vs. in a tibble data frame. Take this example, where you add a new column (age) to a base R data frame:

base_df <- data.frame(id = c(1:3), name = c("mary", "jill","steve"))

base_df$age[base_df$name == "mary"] <- 47

That works without returning a warning. But when the same is done with a tibble, it throws a warning (and consequently, I think causing the weird, seemingly unprovoked, multiple warning issue):

library(tibble)

tibble_df <- tibble(id = c(1:3), name = c("mary", "jill","steve"))

tibble_df$age[tibble_df$name == "mary"] <- 47

Warning message:
Unknown column 'age' 

There are surely better ways of avoiding this, but I have found that first creating a vector of NAs does the job:

tibble_df$age <- NA

tibble_df$age[tibble_df$name == "mary"] <- 47
sabre
  • 572
  • 3
  • 7
  • 14
    My answer is clearly not the entire story: I'm still getting the (multiple) warnings, and as other commenters alluded to, the frustrating part is the *apparent* arbitrariness of it. A `tbl_df` seems to be necessary to produce the warnings, but I am not sure that it is sufficient. That is, I think this warning might emerge when `tbl_df`s are used in conjunction with functions from other tidyverse packages (e.g., tidyr, dplyr). Small price to pay for such a critical suite of packages, but strange/annoying nonetheless. – sabre Oct 18 '16 at 19:56
  • Creating a vector of `NA`s worked for me! (RStudio Version 1.1.456, R version 3.5.1) – petzi Nov 01 '18 at 10:09
  • Sometimes I want to specify the type of the column, e.g. R Dates, and if I fill in `NA`, dates that are filled later will be converted to the numeric type. – Jiāgěng May 11 '19 at 21:29
  • 1
    @Jiāgěng `as.Date(NA_character_)` gives `NA` with class `Date`. – Stibu Sep 03 '19 at 06:10
  • Tibbles are by design more restrictive than data.frames. It might be by design, that you are not supposed to initiate a column by assigning only part to it. However, if this is a protective feature and not a design error, then an early one-time error in tibble assignment would be much preferrable. – vinnief Apr 29 '20 at 20:41
19

I have faced this issue when using the "dplyr" package.
For those facing this problem after using the "group_by" function in the "dplyr" library:

I have found that ungrouping the variables solves the unknown column warning problem. Sometimes I have had to iterate through the ungrouping several times until the problem is resolved.

Varun
  • 335
  • 3
  • 10
6

Converting the class into data.frame solved the problem for me:

library(dplyr)
df <- data.frame(id = c(1,1:3), name = c("mary", "jo", "jill","steve"))
dfTbl <- df %>%
  group_by(id) %>%
  summarize (n = n())
class(dfTbl) # [1] "tbl_df"     "tbl"        "data.frame"
dfTbl = as.data.frame(dfTbl)
class(dfTbl) # [1] "data.frame"

Borrowed the partial script from @adts

Tom Aranda
  • 4,698
  • 10
  • 29
  • 45
stok
  • 340
  • 4
  • 8
  • it works like a charm. I was wondering is there any downside to converting it to a data frame and then converting it back to tibble. Is it only the warnings that it loses? – p130ter Dec 19 '17 at 15:08
  • 2
    Did not work for me RStudio 1.1.442 still getting `Warning message: Unknown or uninitialised column: 'bad_column' ` – andemexoax Apr 05 '19 at 13:30
3

I had this problem when dealing with tibble and lapply functions together. The tibble seemed to save things as a list inside the dataframe.

I solved it by using unlist before adding the results of an lapply function to the tibble.

1

I ran into this problem too except through a tibble created using a dyplyr block. Here's slight modification of sabre's code to show how I came to the same error.

library(dplyr)

df <- data.frame(id = c(1,1:3), name = c("mary", "jo", "jill","steve"))

t <- df %>%
  group_by(id) %>%
  summarize (n = n())

t
str(t)


t$newvar[t$id==1] <- 0
adts
  • 71
  • 5
0

Let's say I wanted to select the following column(s)

best.columns = 'id'

For me the following gave the warning:

df%>% select_(one_of(best.columns))

While this worked as expected, although, as far as I know dplyr, this should be identical.

df%>% select_(.dots = best.columns)
JelenaČuklina
  • 2,839
  • 2
  • 19
  • 32
0

I get these warnings when I rename a column using dplyr::rename after reading it using the readr package.

The old name of the column is not renamed in the spec attribute. So removing the the spec attribute makes the warnings go away. Also removing the "spec_tbl_df" class seems like a good idea.

attr(dat, "spec") <- NULL
class(dat) <- setdiff(class(dat), "spec_tbl_df")
alko989
  • 6,739
  • 3
  • 36
  • 56
0

I know this is an old thread, but I just encountered the same problem when loading a spatial vector in geopackage format with the package sf. Using as_tibble=FALSE worked for me. The file was loaded as an sp object but everything still worked fine. As mentioned by @sabre, trying to force an object into a tibble seems to be making the problems while trying to index a column that was not anymore there.

Jens
  • 2,110
  • 3
  • 24
  • 41