0

unique() removes duplicate elements of a vector, and duplicate rows of an array.

is.element(), %in%, and match() works only on vectors (or NULL).

Are there any value matching or set operations for multiple variables? (e.g. rows of an array)

My current workaround is this. It's not quite elegant, and it's obviously sensitive to "_" matching.

match.multiple <- function (x, table, nomatch = NA_integer_, incomparables = NULL) {
  x_vector <- apply(x, 1, paste, collapse="_")
  table_vector <- apply(table, 1, paste, collapse="_")
  match(x_vector, table_vector, nomatch, incomparables)}

is.element.multiple <- "%in.multiple%" <- function (el, set) match.multiple(el, set, 0) > 0

Edit: adding a reproducible example

Lets say that you wish to buy a car which has an equal number of forward gears and carburetors. It can be 1-each, 2-each etc. You don't know whether the cars that are available on the market (cf. mtcars) comply with your preferences.

preferences <- cbind(1:8, 1:8)
available <- cbind(mtcars$gear, mtcars$carb)

So you do a matching for both variables: gears and carburetors.

m <- match.multiple(preferences, available)
m
# [1] NA NA 12  1 NA NA NA NA
which(!is.na(m))
# [1] 3 4

These are the number of forward gears and carburetors which come in equal quantities.

willbuy <- m[!is.na(m)]
mtcars[willbuy, ]
#     mpg cyl  disp  hp drat   wt  qsec vs am gear carb
# 1: 16.4   8 275.8 180 3.07 4.07 17.40  0  0    3    3
# 2: 21.0   6 160.0 110 3.90 2.62 16.46  0  1    4    4

And these are catalogue entries for cars that you should consider.

9877126
  • 3
  • 7
  • @dww It's not the same thing: it will find a match for `c("A", "C")` in `rbind(c("A", "B"), c("C", "D"))` – 9877126 May 21 '16 at 20:08
  • ok I misunderstood the question. Thought you wanted to find one item in multiple objects. Seems you also want to test for occurrence of a list as a subset within an array or list of lists. Possible to edit the question to make this more obvious? – dww May 21 '16 at 20:13
  • @dww I don't want to include lists in my question -- it just makes it more complicated. But I do want to make it clear. **I wish to match rows of a matrix with rows of another matrix.** – 9877126 May 21 '16 at 20:20
  • Your question reminds me of [this question](http://stackoverflow.com/questions/31330196/find-a-submatrix-in-a-matrix/31554100#31554100) which I answered 10 months ago with a crazy Rcpp solution. Just a possibility. – bgoldst May 21 '16 at 20:23
  • @bgoldst Wow, your solution does not only require the installation of Rtools (~1GB), but also rebooting my computer (cf. [this](http://stackoverflow.com/questions/17619185/rcpp-cant-find-rtools-error-1-occurred-building-shared-library)). I will check it out though. – 9877126 May 21 '16 at 20:47
  • 1
    reproducible example with desired output would be helpful http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Bulat May 21 '16 at 21:05

2 Answers2

1

As I mentioned in the comments, my answer to this question can be adapted to solve this problem. Here's how it can be done, demonstrating with the OP's reproducible example:

avail <- cbind(mtcars$gear,mtcars$carb);
prefs <- cbind(1:8,1:8);
do.call(rbind,apply(prefs,1L,function(x) mtcars[findarray(avail,matrix(x,1L))[,1L],]));
##                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Merc 450SE    16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL    17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC   15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Mazda RX4     21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Merc 280      19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C     17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Community
  • 1
  • 1
bgoldst
  • 30,505
  • 4
  • 34
  • 59
  • Despite all my efforts, I could not install Rtools properly. However, if we change `findarray(avail,matrix(x,1L))[,1L]` in your code to `apply(avail, 1, all.equal, x) == TRUE`, it gives the same result -- without a need for gcc. I do understand that your n-dimensional submatrix finder is a much more powerful tool, I just cannot see how we could exploit its stregths here. – 9877126 May 22 '16 at 01:00
  • This should be even faster than @dww's solution, given that it's written in C++. It's not as easy to use though, as there are multiple complications with having a working Rtools. [#bitterexperience](https://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset) – 9877126 May 23 '16 at 19:07
0

A function to find occurences of a vector within rows of an array:

To test whether a vector (v) is a row of an array or matrix (m), we can construct a second matrix the same dimensions as the one we want to search in, but consisting of repeated rows of the vector we are looking for, and check whether any rows in this constructed array are identical to the original

is.row.in.rows <- function(v,m) {
  which(length(v) == rowSums(m == matrix(v, nrow(m), ncol(m), byrow=TRUE)))
}

Note that it is also possible to perform the same test with a loop using which(apply(m, 1, all.equal, v) == TRUE). But, the above vectorised version using rowSums is faster.

Using this function to solve the reproducible example in the question:

a <- unlist(apply(preferences, MARGIN = 1, is.row.in.rows, available))
a
# [1] 12 13 14  1  2 10 11

mtcars[a,]
#               mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Merc 450SE    16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
# Merc 450SL    17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
# Merc 450SLC   15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
# Mazda RX4     21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# Merc 280      19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
# Merc 280C     17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
dww
  • 25,233
  • 5
  • 47
  • 85
  • The `is.row.in.rows` function throws an error when you compare arrays of different sizes (like in your previous example): `m – 9877126 May 22 '16 at 00:13
  • I didn't notice that you have changed the code of your function. My above comment holds without the `which`. The current version of `is.row.in.rows` should be compared to `which(apply(m, 1, all.equal, v) == TRUE)`. – 9877126 May 22 '16 at 00:40
  • Yes, that is similar, but not precisely the same. all.equal test if objects are nearly the same, rather than identical. And I think that rowSums could be faster than apply(all.equal) – dww May 22 '16 at 00:57
  • It is faster indeed. – 9877126 May 23 '16 at 18:43
  • 1
    And to make it even faster, you can forgo the transposition using `byrow=TRUE`. e.g.: `which(length(v) == rowSums(m == matrix(v, nrow(m), ncol(m), byrow=TRUE)))` – 9877126 May 23 '16 at 18:50
  • Thanks, I've updated the answer to incorporate this suggestion. – dww May 23 '16 at 19:39