How do I write a good function using pipe operators that will not keep resetting the column I put within the function?

Question

I have a data frame of dates and times. I've included the pw_backup column as an example of 13 other columns that I have. I found the differences in time between the two columns to create two more columns in the same data frame called dur_days and dur_hour.

first_pmt_date          pw_backup          
  <dttm>                <dttm>             
 1 2016-04-12 18:57:00   2016-04-12 18:44:00
 2 2016-05-02 17:06:00   2016-05-02 16:41:00
 3 2016-04-06 08:35:00   2016-04-06 08:33:00
 4 2016-03-15 22:38:00   2016-03-15 22:12:00
 5 2016-04-15 14:36:00   2016-04-15 14:30:00
 6 2016-03-22 16:51:00   2016-03-22 16:43:00
 7 2016-03-25 07:52:00   2016-05-31 07:40:00
 8 2016-04-11 12:39:00   2016-04-11 12:22:00
 9 2016-03-08 13:13:00   2016-03-08 09:50:00
10 2016-02-28 13:43:00   2016-05-08 15:44:00

My code gives me the output that I want. I am having trouble changing it into a function, and eventually a for loop looping through all of the columns, so I can add any column to (x) and get the same output.

My Current Code:

paywall_full %>%
  filter(paid == 1 & !is.na(pw_backup)) %>%
  mutate(dur_days = round(difftime(first_pmt_date, pw_backup, units= 'days')), 0,
         dur_hour = difftime(first_pmt_date, pw_backup)) %>%
  select(first_pmt_date, pw_backup, dur_days, dur_hour) %>%
  summarise(same_day_conv = sum(dur_days == 0)/count_it$pw_backup,
        same_hour_conv = sum(dur_hour <= 60 & dur_hour >= 
        0)/count_it$pw_backup)

The code that I imagined would work, replacing the current column with x so as to add any other column into my function and have the same output.

conv_rate <- function(x)

paywall_full %>%
  filter(paid == 1 & !is.na(x)) %>%
  mutate(dur_days = round(difftime(first_pmt_date, x, units = 'days')), 0,
         dur_hour = difftime(first_pmt_date, x)) %>%
  select(first_pmt_date, x, dur_days, dur_hour) %>%
  summarise(same_day_conv = sum(dur_days == 0)/count_it$pw_backup,
        same_hour_conv = sum(dur_hour <= 60 & dur_hour >= 
        0)/count_it$pw_backup)

I understand why it doesn't work, if I define a variable beforehand

 x <- paywall_full$pw_backup

This overwrites the pipeline every time it passes to another function. I hope my question is clear.

BONUS: Turning this into a loop through my columns and assigning to a data frame.

Thanks in advance!

I think you should read the [Programming with dplyr vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html). It will explain how to use dynamic column names and such, so you could call your function with, e.g., `conv_rate(paywall_full, pw_backup)`. — Gregor Thomas, Jun 21 '18 at 16:20
On that note, it's good practice for functions in functional programming languages (like R) to take as inputs everything that's used in the function. Like your data frame. You might end up with two version of data and want to use the function on both, or even just renaming your data frame to something else. The way your function is written it can only work on a data frame named exactly `paywall_full`, but if you pass the data in as an argument it can work on any data frame with the right columns/structure. — Gregor Thomas, Jun 21 '18 at 16:22
I fully agree with Gregor. You'll want to check this out for more reasons / how R finds the variables within functions: http://adv-r.had.co.nz/Functions.html#lexical-scoping — A Duv, Jun 21 '18 at 16:23
Also a possible duplicate: [dplyr - use dynamic variable names](https://stackoverflow.com/q/26003574/903061). That question is focused on `mutate` but all the dplyr verbs are similar. — Gregor Thomas, Jun 21 '18 at 16:23

How do I write a good function using pipe operators that will not keep resetting the column I put within the function?

0 Answers0