0

The point of this question is that I want to know how to update a dataframe inside of either a for loop or a function. So i know there are other ways to do the specific task i am looking at, but i want to know how to do it the way i am trying to do it.

I have a data frame with 15 columns and 2k observations with some 98 and 99s. For each row in where there is a 98 or 99 for any variable/column, I want to remove the whole row. I create a function to filter by variable name not equal to 98/99, and use lapply. however, instead of continually updating the data frame, It just spits out a series of data frames, overwriting the previous data frame, meaning that at the end i will only get a data frame with the last column cleaned. How do i get it to update the data frame for each column sequentially?

nafunction = function(variable){
  kuwait5=kuwait5%>%
    filter(variable<90)
}

`nafunction = function(variable){
  kuwait5=kuwait5%>%
    filter(variable<90)
}
       lapply(kuwait5, nafunction)`

Expected result is a new data frame with all rows that have an 98 removed. What i get is a sequence of data frames each one having ONE column in which rows with NAS are removed.

Liam385
  • 21
  • 2
  • this will probably work, but can I get an answer to my question of how I would continually update a data frame inside a for loop iterating over all the columns? Because im more interested in that atm – Liam385 Jul 08 '19 at 22:02
  • Yes I am confused about modifying objects inside of functions @joran, can you explain what is the proper way to do that? thats sort of the point of the question is i want to know how to properly modify stuff inside of functions. Id be willing to do it in a for loop as well – Liam385 Jul 08 '19 at 22:16
  • 1
    Another option would be to first recode 98, 99 to NA and then use complete.cases to filter rows which have no NAs. See also https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame – TimTeaFan Jul 08 '19 at 22:18
  • Not sure I understand why a loop would be necessary. `dplyr::filter_all(kuwait5, all_vars(. < 90))` should output what you want if all the columns are numeric. https://dplyr.tidyverse.org/reference/filter_all.html – Jon Spring Jul 09 '19 at 03:49

0 Answers0