0

I would like to select a part of the data with dplyr to carry out an operation on, but without it saving the selection on which the operation has been carried out. My database looks as follows:

   country country-year year     a     b
1  France  France2000   2000       NA    NA 
2  France  France2001   2001     1000  1000  
3  France  France2002   2002       NA    NA
4  France  France2003   2003     1600  2200
5  France  France2004   2004       NA    NA
6  UK          UK2000   2000     1000  1000  
7  UK          UK2001   2001       NA    NA
8  UK          UK2002   2002       NA    NA  
9  UK          UK2003   2003       NA    NA
10 UK          UK2004   2004       NA    NA
11 Germany     UK2000   2000       NA    NA 
12 Germany     UK2001   2001       NA    NA
13 Germany     UK2002   2002       NA    NA  
14 Germany     UK2003   2003       NA    NA
15 Germany     UK2004   2004       NA    NA

As an example:

# I first select the group
df <- df %>% 
  group_by(country)%>% 

For this group, I want to interpolate (only interpolate!) when there is more than 1 observation, but I do not want to remove the groups where there are only 1 or less observation.

I was wondering if I can select countries where n>1 and only for those group carry out the operation:

mutate_at(vars(a:b),~na.fill(.x,c(NA, "extend", NA))) 

I also thought about the following, but I cannot get the syntax right:

mutate_if(is.numeric,~if(n()>1  NA else na.fill(.x,c(NA, "extend", NA)))

The desired result would be:

   country country-year year     a     b
1  France  France2000   2000       NA    NA 
2  France  France2001   2001     1000  1000  
3  France  France2002   2002  **1300****1600**  
4  France  France2003   2003     1600  2200
5  France  France2004   2004       NA    NA
6  UK          UK2000   2000     1000  1000  
7  UK          UK2001   2001       NA    NA
8  UK          UK2002   2002       NA    NA  
9  UK          UK2003   2003       NA    NA
10 UK          UK2004   2004       NA    NA
11 Germany     UK2000   2000       NA    NA 
12 Germany     UK2001   2001       NA    NA
13 Germany     UK2002   2002       NA    NA  
14 Germany     UK2003   2003       NA    NA
15 Germany     UK2004   2004       NA    NA

Any suggestions?

Tom
  • 1,237
  • 8
  • 29

1 Answers1

2

This should work:

df %>% 
  group_by(country)%>% 
  mutate_at(vars(a:b),
           ~as.numeric(if (sum(!is.na(.x))>1)
                       na.fill(.x,c(NA,"extend",NA))
                       else .x))
Nicolas2
  • 2,020
  • 1
  • 4
  • 13
  • Thank you for your answer! It regrettably still gives the error `Error in mutate_impl(.data, dots) : Evaluation error: need at least two non-NA values to interpolate.` Any idea what could be the problem? – Tom Sep 17 '18 at 15:25
  • Is it possible that it should not be `length(.x)>1` but something like; `!is.na(.x)>1`? Would that syntax be allowed? – Tom Sep 17 '18 at 15:35
  • Right. I removed rows 7 to 10 to be in the conditions of your specification (one country with only one row), but `na.fill` requires two not NA values and I skipped that cases. See my edit, now it works. – Nicolas2 Sep 18 '18 at 11:22
  • It worked, thank you so much! I really appreciate it! – Tom Sep 18 '18 at 11:49