1

I have the following data frame:

sourceid dataelementid value timestamp
11726    10922         34    2016-04-03 02:05:02
11726    10923         9     2016-05-29 10:47:59
11726    10923         9     2016-05-29 03:47:59
11726    10924         19    2016-03-20 02:05:02
11726    10922         18    2016-05-29 10:47:59
12389    10922         23    2016-07-17 02:05:02
12389    10923         12    2016-04-09 02:05:02
12389    10923         3     2016-09-04 02:05:02
12389    10923         30    2016-04-03 02:05:02
12389    10924         23    2016-04-03 02:05:02
12389    10924         17    2016-05-30 02:05:02
12389    10922         15    2016-04-03 02:05:02
45012    10922         33    2016-03-03 02:05:02
45012    10924         11    2016-05-29 10:47:59

As you can see, I have two columns with unique ID's (Sourceid and dataelementid). What I would like to do is to get the most recent data from the data frame and have a new data frame that looks like this:

sourceid dataelementid value timestamp
11726    10922         18    2016-05-29 10:47:59
11726    10923         9     2016-05-29 10:47:59
11726    10924         19    2016-03-20 02:05:02
12389    10922         23    2016-07-17 02:05:02
12389    10923         3     2016-09-04 02:05:02
12389    10924         17    2016-05-30 02:05:02
45012    10922         33    2016-03-03 02:05:02
45012    NA
45012    10924         11    2016-05-29 10:47:59

I have searched for a solution to this but those I have found are only looking at data frames with one column containing the ID's. If a solution to a similar problem to this already exists, it would be great if I could be pointed to it.

Thanks in advance.

zx8754
  • 42,109
  • 10
  • 93
  • 154
Ali Nguz
  • 13
  • 3
  • Please provide reproducible example: `dput(head(mydata, 20))` – zx8754 Nov 08 '18 at 11:48
  • It is a 2 step, 1st group by and subset max per group. Then 2nd: add missing combos of 2 ids: "sourceid" and "dataelementid". See linked posts. – zx8754 Nov 08 '18 at 12:06

0 Answers0