7

I right away give an example, now suppose I have 3 arrays a,b,c such as

a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)

I must be able to extract consecutive triplets among them i,e.,

c(1,2,3),c(4,5,6)

But this was just an example, I would be having a larger data set with even more than 10 arrays, hence must be able to find the consecutive series of length ten.

So could anyone provide an algorithm, to generally find the consecutive series of length 'n' among 'n' arrays.

I am actually doing this stuff in R, so its preferable if you give your code in R. Yet algorithm from any language is more than welcomed.

Psidom
  • 171,477
  • 20
  • 249
  • 286

4 Answers4

7

Reorganize the data first into a list containing value and array number. Sort the list; you'd have smth like:

1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3

Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers

Jur
  • 510
  • 2
  • 14
  • Great idea (upvoted), there could be just one difficulty if same number is in more vectors, but it's easy to adapt the solution to account for that :) – digEmAll Jun 24 '16 at 15:24
5

Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.

N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)

Then i combine them together and order by the observations

dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]

Now I look for rows in this table where all three groups are represented

idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
    length(unique(x))==N
})

Then we can see the triplets with

lapply(which(idx), function(i) {
    dd[i:(i+N-1),]
})

# [[1]]
#    data which
# b2    1     b
# c2    2     c
# a1    3     a
# 
# [[2]]
#    data which
# c1    4     c
# a2    5     a
# b1    6     b
MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • That perfectly worked for the given example. But could you help me out in forming group as I have 'N' number of arrays, all as a list in a List. – Dwarakanath Thoppe Jun 25 '16 at 07:20
2

Here is a brute force method with expand.grid and three vectors as in the example

# get all combinations
df <- expand.grid(a,b,c)

Using combn to calculate difference for each pairwise combination.

# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))

# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]

df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
   Var1 Var2 Var3
2     5    6    4
11    3    1    2
lmo
  • 35,764
  • 9
  • 49
  • 57
  • Got any ideas to pass 'N' list in expand.grid() method? – Dwarakanath Thoppe Jun 25 '16 at 09:43
  • I just tried and `expand.grid` will accept a list of vectors. You can collect the vectors in a list using `mget` and `ls`. Play around with my answer to [this post](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) to build such a list. – lmo Jun 25 '16 at 13:04
1

I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.

The function uses the ellipsis, ..., for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items. Then the smallest value amongst each passed vector is located, along with its index.

Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for() loop, where the output values are passed to the output vector out. The input vectors in items are pruned and passed again into the function in a recursive fashion. Only, when all vectors are NA, i.e. there are no more values in the vectors, the function returns the final result.

library(magrittr)

# define function to find the triplets
tripl <- function(...){
  items <- list(...)

  # find the smallest number in each passed vector, along with its index
  # output is a matrix of n-by-2, where n is the number of passed arguments
  triplet.id <- lapply(items, function(x){
    if(is.na(x) %>% prod) id <- c(NA, NA)
    else id <- c(which(x == min(x)), x[which(x == min(x))])
  }) %>% unlist %>% matrix(., ncol=2, byrow=T)


  # find the smallest triplet from the passed vectors
  index <- order(triplet.id[,2])[1:3]
  # create empty vector for output
  out <- vector()

  # go through the smallest triplet's indices
  for(i in index){
    # .. append the coresponding item from the input vector to the out vector
    # .. and remove the value from the input vector
    if(length(items[[i]]) == 1) {
      out <- append(out, items[[i]])
      # .. if the input vector has no value left fill with NA
      items[[i]] <- NA
    }
    else {
      out <- append(out, items[[i]][triplet.id[i,1]])
      items[[i]] <- items[[i]][-triplet.id[i,1]]
    }
  }

  # recurse until all vectors are empty (NA)
  if(!prod(unlist(is.na(items)))) out <- append(list(out), 
                                                do.call("tripl", c(items), quote = F))
  else(out <- list(out))

  # return result
  return(out)
}

The function can be called by passing the input vectors as arguments.

# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)

# find all the triplets using our function
y <- tripl(a,b,c) 

The result is a list, which contains all the neccesary information, albeit unordered.

print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
# 
# [[3]]
# [1]  7  9 NA
#
# [[4]]
# [1]  8 NA NA

Ordering everything can be done using sapply():

# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
#       [,1] [,2] [,3]
# [1,]    1    2    3
# [2,]    4    5    6
# [3,]    7    9   NA
# [4,]    8   NA   NA

The thing is, that it will use only one value per vector to find triplets. It will therefore not find the consecutive triplet c(6,7,8) among e.g. c(6,7,11), c(8,9,13) and c(10,12,14). In this instance it would return c(6,8,10) (see below).

a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)

y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
#     [,1] [,2] [,3]
# [1,]    6    8   10
# [2,]    7    9   12
# [3,]   11   13   14
DrEigelb
  • 538
  • 4
  • 8