0

I am having trouble with a function I wrote when trying to apply it to a dataframe to mutate in a new column

I want to add a column to a dataframe that calculates the sunrise/sunset time for all rows based on existing columns for Latitude, Longitude and Date. The sunrise/sunset calculation is derived from the "sunriseset" function from the maptools package.

Below is my function:

library(maptools)
library(tidyverse)

sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1) 
{
        lat.long <- matrix(c(long, lat), nrow = 1)
        day <- as.POSIXct(date, tz = timezone)
        sequence <- seq(from = day, length.out = num.days, by = "days")
        sunrise <- sunriset(lat.long, sequence, direction = "sunrise", 
                            POSIXct = TRUE)
        sunset <- sunriset(lat.long, sequence, direction = "sunset", 
                           POSIXct = TRUE)
        ss <- data.frame(sunrise, sunset)
        ss <- ss[, -c(1, 3)]
        colnames(ss) <- c("sunrise", "sunset")

        if (direction == "sunrise") {
                return(ss[1,1])     
        } else {
                return(ss[1,2])
        }       
}

When I run the function for a single input I get the expected output:

sunrise.set2(41.2, -73.2, "2018-12-09 07:34:0", timezone="EST", 
    direction = "sunset", num.days = 1)
[1] "2018-12-09 16:23:46 EST"

However, when I try to do this on a dataframe object to mutate in a new column like so:

df <- df %>% 
    mutate(set = sunrise.set2(Latitude, Longitude, LocalDateTime, timezone="UTC", num.days = 1, direction = "sunset"))

I get the following error:

Error in mutate_impl(.data, dots) : 
  Evaluation error: 'from' must be of length 1.

The dput of my df is below. I suspect I'm not doing something right in order to properly vectorize my function but I'm not sure what.

Thanks

dput(df):

structure(list(Latitude = c(20.666, 20.676, 20.686, 20.696, 20.706, 
20.716, 20.726, 20.736, 20.746, 20.756, 20.766, 20.776), Longitude = c(-156.449, 
-156.459, -156.469, -156.479, -156.489, -156.499, -156.509, -156.519, 
-156.529, -156.539, -156.549, -156.559), LocalDateTime = structure(c(1534318440, 
1534404840, 1534491240, 1534577640, 1534664040, 1534750440, 1534836840, 
1534923240, 1535009640, 1535096040, 1535182440, 1535268840), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), .Names = c("Latitude", "Longitude", 
"LocalDateTime"), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"), spec = structure(list(cols = structure(list(
    Latitude = structure(list(), class = c("collector_double", 
    "collector")), Longitude = structure(list(), class = c("collector_double", 
    "collector")), LocalDateTime = structure(list(format = "%m/%d/%Y %H:%M"), .Names = "format", class = c("collector_datetime", 
    "collector"))), .Names = c("Latitude", "Longitude", "LocalDateTime"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
DarwinsBeard
  • 187
  • 2
  • 10
  • Try `df %>% rowwise() %>% mutate(...)` – A. Suliman Dec 10 '18 at 04:02
  • Thanks Suliman, this technically worked, but I noticed it is REALLY slow with a large dataframe? Is there a more efficient way? – DarwinsBeard Dec 10 '18 at 06:14
  • You can try `apply(df,1,sunrise.set2...)`. Also you may find a solution [here](https://deanattali.com/blog/mutate-non-vectorized/), [here](https://stackoverflow.com/questions/43278743/custom-function-with-mutate-do-not-work) or [here](https://stackoverflow.com/questions/44730774/how-to-use-custom-functions-in-mutate-dplyr) – A. Suliman Dec 10 '18 at 06:22

1 Answers1

2

The problem is indeed that your function as it is now is not vectorized, it breaks if you give it more than one value. A workaround (as Suliman suggested) is using rowwise() or a variant of apply, but that would give your function a lot of unnecessary work.

So better to make it vectorized, as maptools::sunriset is also vectorized. First suggestion: Debug or rewrite it with vectors as input, and then you easily see the lines where something unexpected happens. Let's go at it line by line, I've outcommented your lines where I replace it with something else:

library(maptools)
library(tidyverse)

# sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1) 
sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset")
# Why an argument saying how many days? You have the length of your dates
{
        #lat.long <- matrix(c(long, lat), nrow = 1)
        lat.long <- cbind(lon, lat)
        day <- as.POSIXct(date, tz = timezone)
        # sequence <- seq(from = day, length.out = num.days, by = "days") # Your days object is fine
        sunrise <- sunriset(lat.long, day, direction = "sunrise", 
                            POSIXct = TRUE)
        sunset <- sunriset(lat.long, day, direction = "sunset", 
                           POSIXct = TRUE)
        # I've replaced sequence with day here
        ss <- data.frame(sunrise, sunset)
        ss <- ss[, -c(1, 3)]
        colnames(ss) <- c("sunrise", "sunset")

        if (direction == "sunrise") {
                #return(ss[1,1])
                return(ss[,1])
        } else {
                #return(ss[1,2])
                return(ss[,2])
        }       
}

But looking at your function, I think there is still a lot of extra work done that doesn't serve any purpose.

  • You're calculating both sunrise and sunset, only to use one of them. And you can just pass one your direction-argument, without even looking at it.
  • Is it useful to ask for a seperate date and timezone? When your users give you a POSIXt-object, the timezone is included. And it's nice if you can input a string as a date, but that only works if it's in the right format. To keep it simple, I'd just ask for a POSIXct as input (which is in your example-data.frame)
  • Why are you making a data.frame and assigning names before returning? As soon as you're subsetting, it all gets dropped again.

Which means your function can be a lot shorter:

sunrise.set2 <- function(lat, lon, date, direction = c("sunrise", "sunset")) {
  lat.long <- cbind(lon, lat)
  sunriset(lat.long, date, direction=direction, POSIXct.out=TRUE)[,2]
}

If you have no control over your input you might need to add some checks, but usually I find it most useful to keep focused on just the thing you want to accomplish.

Emil Bode
  • 1,505
  • 6
  • 15
  • Awesome! Thanks Emil, not only did this answer the question, but thank you so much for keeping the outcomments on my original func as it definitely helps me to understand it more! – DarwinsBeard Dec 10 '18 at 16:54