3

I've got a data.frame of monthly values of a variable for many locations (so many rows) and I want to count the numbers of consecutive months (i.e consecutive cells) that have a value of zero. This would be easy if it was just being read left to right, but the added complication is that the end of the year is consecutive to the start of the year.

For example, in the shortened example dataset below (with seasons instead of months),location 1 has 3 '0' months, location 2 has 2, and 3 has none.

df<-cbind(location= c(1,2,3),
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

How can I count these consecutive zero values? I've looked at rle but I'm still none the wiser currently!

Many thanks for any help :)

kim1801
  • 121
  • 2
  • 10

2 Answers2

2

You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:

df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

#>      Winter Spring Summer Autumn
#> [1,]      0      0      0      3
#> [2,]      0      2      2      0
#> [3,]      3      4      7      4


# calculate the number of consecutive zeros at the start and end
startZeros  <-  apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros  <-  apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0

# calculate the longest run of zeros
longestRun  <-  apply(df,1,function(x){
                y = rle(x);
                max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0

# take the max of the two values
pmax(longestRun,startZeros +endZeros  )
#> [1] 3 2 0

Of course an even easier solution is:

longestRun  <-  apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
                      1,# the margin over which to apply the summary function
                      function(x){# the summary function
                          y = rle(x);
                          max(y$lengths[y$values==0],
                              0)#include zero incase there are no zeros in y$values
                      })

Note that the above solution works because my df does not include the location field (column).

Jthorpe
  • 8,342
  • 2
  • 36
  • 50
  • Brilliiant! The only thing I can fault in that when there was only zeros, it looped through that row twice to give a count of 24 (rather than 12, I was using number of months instead of seasons!) But that was easy to sort. Thanks very much! – kim1801 Jun 04 '15 at 08:36
2

Try this:

df <- data.frame(location = c(1, 2, 3),
                 Winter = c(0, 0, 3),
                 Spring = c(0, 2, 4),
                 Summer = c(0, 2, 7),
                 Autumn = c(3, 0, 4))

maxcumzero <- function(x) {
    l <- x == 0
    max(cumsum(l) - cummax(cumsum(l) * !l))
}

df$N.Consec <- apply(cbind(df[, -1], df[, -1]), 1, maxcumzero)

df
#   location Winter Spring Summer Autumn N.Consec
# 1        1      0      0      0      3        3
# 2        2      0      2      2      0        2
# 3        3      3      4      7      4        0

This adds a column to the data frame specifying the maximum number of times zero has occurred consecutively in each row of the data frame. The data frame is column bound to itself to be able to detect consecutive zeroes between autumn and winter.

The method used here is based on that of Martin Morgan in his answer to this similar question.

Community
  • 1
  • 1
Alex A.
  • 5,106
  • 4
  • 24
  • 54
  • @DavidArenburg: Can you provide an example of when it wouldn't? It worked as expected in all of the testing I did. – Alex A. Jun 03 '15 at 18:50
  • It doesn't take in count the end and the beginning of the year. See the result in the other answer. It should be `3 2 0` instead of `3 1 0` – David Arenburg Jun 03 '15 at 18:50
  • You could potentially fix this by doing `apply(cbind(df[, ncol(df), drop = FALSE], df[, -c(1, ncol(df))]), 1, maxcumzero)` but not sure how pretty it is. – David Arenburg Jun 03 '15 at 18:57
  • Hmm, I don't really like the fix (though it seem to work). Did you see the result of `cbind(df[, -1], df[, -1])`? – David Arenburg Jun 03 '15 at 19:00
  • @DavidArenburg: I agree it's not an ideal fix, but it ensures that all seasons are consecutive. I tried your suggested fix and it does work in this situation, but the problem I see with it is that it summer and autumn are no longer consecutive. – Alex A. Jun 03 '15 at 20:07