5

So I want to subset my data frame to select rows with a daily maximum value.

Site    Year   Day     Time      Cover       Size TempChange
 ST1    2011    97      0.0     Closed      small       0.97
 ST1    2011    97      0.5     Closed      small       1.02
 ST1    2011    97      1.0     Closed      small       1.10

Section of data frame is above. I would like to select only the rows which have the maximum value of the variable TempChange for each variable Day. I want to do this because I am interested in specific variables (not shown) for these particular times.

AMENDED EXAMPLE AND REQUIRED OUTPUT

Site  Day   Temp     Row
a     10    0.2     1
a     10    0.3     2
a     11    0.5     3
a     11    0.4     4
b     10    0.1     5
b     10    0.8     6
b     11    0.7     7
b     11    0.6     8
c     10    0.2     9
c     10    0.3     10
c     11    0.5     11
c     11    0.8     12

REQUIRED OUTPUT

Site  Day   Temp     Row
a     10    0.3     2
a     11    0.5     3
b     10    0.8     6
b     11    0.7     7
c     10    0.3     10
c     11    0.8     12

Hope that makes it clearer.

Spacedman
  • 86,225
  • 12
  • 117
  • 197
Diarmuid Ryan
  • 71
  • 1
  • 5

1 Answers1

7

After faffing with raw data frame code, I realised plyr could do this in one:

> df
  Day          V Z
1  97 0.26575207 1
2  97 0.09443351 2
3  97 0.88097858 3
4  98 0.62241515 4
5  98 0.61985937 5
6  99 0.06956219 6
7 100 0.86638108 7
8 100 0.08382254 8

> ddply(df,~Day,function(x){x[which.max(x$V),]})
  Day          V Z
1  97 0.88097858 3
2  98 0.62241515 4
3  99 0.06956219 6
4 100 0.86638108 7

To get the rows for max values for unique combinations of more than one column, just add the variable to the formula. For your modified example, its then:

> df
   Site Day Temp Row
1     a  10  0.2   1
2     a  10  0.3   2
3     a  11  0.5   3
4     a  11  0.4   4
5     b  10  0.1   5
6     b  10  0.8   6
7     b  11  0.7   7
8     b  11  0.6   8
9     c  10  0.2   9
10    c  10  0.3  10
11    c  11  0.5  11
12    c  11  0.8  12
> ddply(df,~Day+Site,function(x){x[which.max(x$Temp),]})
  Site Day Temp Row
1    a  10  0.3   2
2    b  10  0.8   6
3    c  10  0.3  10
4    a  11  0.5   3
5    b  11  0.7   7
6    c  11  0.8  12

Note this isn't in the same order as your original dataframe, but you can fix that.

> dmax = ddply(df,~Day+Site,function(x){x[which.max(x$Temp),]})
> dmax[order(dmax$Row),]
  Site Day Temp Row
1    a  10  0.3   2
4    a  11  0.5   3
2    b  10  0.8   6
5    b  11  0.7   7
3    c  10  0.3  10
6    c  11  0.8  12
Spacedman
  • 86,225
  • 12
  • 117
  • 197
  • another possibility is the use of tapply `tapply(df$V, df$Day, max)` – smu Mar 15 '12 at 12:47
  • That doesn't return the rows, so you can't tell which row had the max, so you can't, for example, find the value of 'Z' in my df that had the max. – Spacedman Mar 15 '12 at 13:09
  • Hi,Thanks. That is pretty close. I should have made it clearer, however as I have a number of sites as replicates, the variable day repeats itself. So I have 18 sites, therefore within the dataframe there is a total of 18 day 97s for example. The code you gave provided the absolute max for each day value regardless of site. What I need is a dataframe providing the daily max for each site. I could use the code for each site seperatly and then append later. Is there a shortcut? Thanks again Diarm – Diarmuid Ryan Mar 20 '12 at 12:39
  • Can you edit your question to give a complete example with a few more lines and the output you would like to get... – Spacedman Mar 20 '12 at 15:56
  • I have put an example in the original question pane – Diarmuid Ryan Mar 20 '12 at 17:28
  • Much clearer! Another trick is to use 'dput' to give short data examples we can read straight into R to run tests on. Anyway, getting what you want is simple... see modified answer... – Spacedman Mar 20 '12 at 22:17
  • Beautiful. Thanks. I was getting ready for a long night on excel! – Diarmuid Ryan Mar 21 '12 at 13:36
  • Can you tick the little tick next to my answer so I get some points? :) You'll also get some points. We're all about the points here. – Spacedman Mar 21 '12 at 16:47
  • You can also use `dplyr`: http://stackoverflow.com/questions/24237399/how-to-select-the-rows-with-maximum-values-in-each-group-with-dplyr – Enrique Pérez Herrero Jun 10 '16 at 09:57