foreach and non-parallel change of iterators

Question

I need to execute foreach with two iteration variables of the same size (for example 1:n), but the function changes them in parallel as written here:

We call a and b the iteration variables, since those are the variables that are changing during the multiple executions. Note that we are iterating over them in parallel, that is, they are both changing at the same time.

What I need is to make foreach to change them independently, so that I would have a list with length n^2, not n.

example:

X = foreach(i=1:n, j=1:n) %do% (sum(M[i,]*M[j,]))

in the end I get a vector of length n which is only a diagonal of matrix X, not the full matrix.

P.S. I was trying to make this with for looping, but the computation time was too great to leave the code unoptimized.

which data structure do you have? Read this about processing tables: http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r — maximus, Aug 06 '13 at 12:50
Would `vignette("nested")` help? It addresses nested `foreach` loops with the `%:%` operator.. — BenBarnes, Aug 06 '13 at 12:57

Roland · Answer 1 · 2013-08-07T06:57:18.917

foreach is not more efficient than for. Look for the %:% operator as commented by @BenBarnes to use it. Parallelization might help a bit, but not much.

Try the following instead of an explicit loop:

M <- matrix(1:8,4)

prodsums <- combn(seq_len(nrow(M)), 2, FUN=function(ind) {
  res <- sum(M[ind[1],]*M[ind[2],])
  names(res) <- paste(ind, collapse="*")
  res
}, simplify=F)

unlist(prodsums)
#1*2 1*3 1*4 2*3 2*4 3*4 
# 32  38  44  48  56  68 

resmat <- matrix(ncol=nrow(M),nrow=nrow(M))
resmat[lower.tri(resmat)] <- unlist(prodsums)
#       [,1] [,2] [,3] [,4]
# [1,]   NA   NA   NA   NA
# [2,]   32   NA   NA   NA
# [3,]   38   48   NA   NA
# [4,]   44   56   68   NA

resmat[upper.tri(resmat)] <- t(resmat)[upper.tri(resmat)]

diag(resmat) <- rowSums(M^2)
#     [,1] [,2] [,3] [,4]
#[1,]   26   32   38   44
#[2,]   32   40   48   56
#[3,]   38   48   58   68
#[4,]   44   56   68   80

Thanks a lot, calculations took 50 times less time! One last question - how can I fill the upper triangle of **resmat** in your example so that it would be symmetric to the lower one? — Seva, Aug 06 '13 at 20:38
@Seva See my edit. Although if your matrix is huge, it might be better to use sparse matrices, since this is redundant information. Btw., if you omit `names(res) — Roland, Aug 07 '13 at 06:57

Steve Weston · Accepted Answer · 2013-08-07T18:47:33.137

1

To nest foreach loops, use the nesting operator, "%:%":

library(foreach)
n <- 4
M <- matrix(rnorm(n*n), n)
X <- foreach(i=1:n, .combine='cbind') %:%
       foreach(j=1:n, .combine='c') %do% {
         sum(M[i,]*M[j,])
       }

To avoid repeated computations, you can have j iterate from i to n, and use a .final function to pad the resulting vectors with NA:

pad <- function(x) c(rep(NA, n - length(x)), x)
Y <- foreach(i=1:n, .combine='cbind') %:%
       foreach(j=i:n, .combine='c', .final=pad) %do% {
         sum(M[i,]*M[j,])
       }

But these solutions are only of academic interest. For simplicity and speed, I suspect that tcrossprod is the best solution by far:

Z <- tcrossprod(M)

For a 4000 X 4000 matrix, this executed in under 8 seconds on my Linux machine.

edited Aug 07 '13 at 18:47

answered Aug 07 '13 at 18:18

Steve Weston

18,019
4
53
70

+1 It's amazing how difficult it is sometimes to think of the obvious solution. However, the `combn` solution can be used for other functions. – Roland Aug 07 '13 at 21:45
@Roland Agreed. I will keep `combn` in mind having seen your example. – Steve Weston Aug 07 '13 at 22:06

foreach and non-parallel change of iterators

2 Answers2

Linked