0

I have a sample data frame like given below

Dataframe1.

  general_id                date
    6              2000-01-02 16:57:13
    2              2000-01-02 19:26:13
    3              2000-01-04 13:30:13
    2              2000-01-04 19:03:13
    7              2000-01-06 16:32:13

Dataframe2.

  general_id                date
    1              2000-01-02 16:57:12
    1              2000-01-06 16:57:12
    1              2000-01-02 19:26:12
    1              2000-01-02 19:26:12
    1              2000-01-04 13:30:12
    1              2000-01-04 13:30:12
    1              2000-01-04 19:03:12
    1              2000-01-04 19:03:12
    1              2000-01-06 16:32:12

there is just a second difference in both date columns of the data frames. i want to compare the date columns of both the data frames and assign the values of general_id column in Dataframe1 to general_id column in Dataframe2

     date1 <- Dataframe1$date-dsecond(1)
     date2 <- Dataframe1$date

     if(date1==date2){
       dataframe2$general_id=dataframe1$general_id
     }

but i'm getting this error,

In if (date1 == date2) the condition has length > 1 and only the first element will be used

Desired Output is:

Dataframe1

          general_id                date
            6              2000-01-02 16:57:13
            2              2000-01-02 19:26:13
            3              2000-01-04 13:30:13
            2              2000-01-04 19:03:13
            7              2000-01-06 16:32:13

Dataframe2

             general_id                date
               6              2000-01-02 16:57:12
               6              2000-01-06 16:57:12
               2              2000-01-02 19:26:12
               2              2000-01-02 19:26:12
               3              2000-01-04 13:30:12
               3              2000-01-04 13:30:12
               2              2000-01-04 19:03:12
               2              2000-01-04 19:03:12
               7              2000-01-06 16:32:12
               7              2000-01-06 16:32:12
IloveML
  • 17
  • 3
  • Please show the code you're using. Also, your desired output is, in both cases, just Dataframe1 again. Can you modify your example to show why we can't just use `Dataframe2 – divibisan Apr 10 '19 at 15:55
  • Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – divibisan Apr 10 '19 at 16:09
  • It looks like you want to do a `join_*` on your data: `dplyr::left_join(Dataframe2, Dataframe1, by = 'date')`, then select the right `general_id` column. – divibisan Apr 10 '19 at 16:11
  • but there is just a 1 second difference in dataframe2, how do i need to compare that. – IloveML Apr 10 '19 at 16:25

2 Answers2

0

The operation you want to carry out here is known as a join, specifically you want to left_join df2 with df1 so you keep all rows in df2 and then add matching columns from df1.

To learn more about Joins and how to use them in R, please read this related question: How to join (merge) data frames (inner, outer, left, right)

The complication here is the fact that the date columns are off by one second. To work with that, we just need to modify date using lubridate::dseconds before joining.

First, we take your data and ensure that date is formatted as POSIXct using lubridate::as_datetime so we can work with it as a date.

This will give us the following dataframes based on your data:

df1 <- structure(list(general_id = c(6L, 2L, 3L, 2L, 7L), date = structure(c(946832233, 
946841173, 946992613, 947012593, 947176333), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), row.names = c(NA, -5L), class = "data.frame")

df2 <- structure(list(general_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), date = structure(c(946832232, 947177832, 946841172, 946841172, 
946992612, 946992612, 947012592, 947012592, 947176332), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), row.names = c(NA, -9L), class = "data.frame")

Now we can do a join, but notice that we modify df1$date within the join by using dplyr::mutate:

library(dplyr)
left_join(df2, mutate(df1, date = date - lubridate::dseconds(1)), by = 'date')

  general_id.x                date general_id.y
1            1 2000-01-02 16:57:12            6
2            1 2000-01-06 16:57:12           NA
3            1 2000-01-02 19:26:12            2
4            1 2000-01-02 19:26:12            2
5            1 2000-01-04 13:30:12            3
6            1 2000-01-04 13:30:12            3
7            1 2000-01-04 19:03:12            2
8            1 2000-01-04 19:03:12            2
9            1 2000-01-06 16:32:12            7

As you can see, we've added the appropriate general_id column from df1. We can then drop general_id.x and rename general_id.y if desired. Note that row 2 returns NA since it doesn't have a match in df1 (the time matches, but the day is different)

divibisan
  • 8,631
  • 11
  • 31
  • 46
0

The following code checks for a time difference between dates in date column of less than 2 seconds. To make it match exactly 1 second of a time difference on only one direction change the which statement.

for (i in 1:nrow(Dataframe2)) {
  corresponding_row <- which(abs(as.POSIXct(Dataframe1$date)-as.POSIXct(Dataframe2$date[i]))<2)
  message('row ', i, ' of Dataframe2 corresponds to row ', corresponding_row, ' of Dataframe1') 
  Dataframe2$id[i] <- ifelse(length(corresponnding_row), Dataframe1$id[corresponding_row], NA)
}
Simon
  • 537
  • 3
  • 8