Looking for non-recursive algorithm for visiting all k-combinations of a multiset in lexicographic order

Question

More specifically, I'm looking for an algorithm A that takes as its inputs

a sorted multiset M = {a₁, a₂, …, a_n } of non-negative integers;
an integer 0 &leq; k &leq; n = |M |;
a "visitor" callback V (taking a k-combination of M as input);
(optional) a sorted k-combination K of M (DEFAULT: the k-combination {a₁, a₂, …, a_k }).

The algorithm will then visit, in lexicographic order, all the k-combinations of M, starting with K, and apply the callback V to each.

For example, if M = {0, 0, 1, 2}, k = 2, and K = {0, 1}, then executing A(M, k, V, K ) will result in the application of the visitor callback V to each of the k-combinations {0, 1}, {0, 2}, {1, 2}, in this order.

A critical requirement is that the algorithm be non-recursive.

Less critical is the precise ordering in which the k-combinations are visited, so long as the ordering is consistent. For example, colexicographic order would be fine as well. The reason for this requirement is to be able to visit all k-combinations by running the algorithm in batches.

In case there are any ambiguities in my terminology, in the remainder of this post I give some definitions that I hope will clarify matters.

A multiset is like a set, except that repetitions are allowed. For example, M = {0, 0, 1, 2} is a multiset of size 4. For this question I'm interested only in finite multisets. Also, for this question I assume that the elements of the multiset are all non-negative integers.

Define a k-combination of a multiset M as any sub-multiset of M of size k. E.g. the 2-combinations of M = {0, 0, 1, 2} are {0, 0}, {0, 1}, {0, 2}, and {1, 2}.

As with sets, the ordering of a multiset's elements does not matter. (e.g. M can also be represented as {2, 0, 1, 0}, or {1, 2, 0, 0}, etc.) but we can define a canonical representation of the multiset as the one in which the elements (here assumed to be non-negative integers) are in ascending order. In this case, any collection of k-combinations of a multiset can itself be ordered lexicographically by the canonical representations of its members. (The sequence of all 2-combinations of M given earlier exhibits such an ordering.)

UPDATE: below I've translated rici's elegant algorithm from C++ to JavaScript as faithfully as I could, and put a simple wrapper around it to conform to the question's specs and notation.

function A(M, k, V, K) {

    if (K === undefined) K = M.slice(0, k);

    var less_than = function (a, b) { return a < b; };

    function next_comb(first, last,
                       /* first_value */ _, last_value,
                       comp) {

        if (comp === undefined) comp = less_than;

        // 1. Find the rightmost value which could be advanced, if any
        var p = last;

        while (p != first && ! comp(K[p - 1], M[--last_value])) --p;
        if (p == first) return false;

        // 2. Find the smallest value which is greater than the selected value
        for (--p; comp(K[p], M[last_value - 1]); --last_value) ;

        // 3. Overwrite the suffix of the subset with the lexicographically
        //    smallest sequence starting with the new value
        while (p !== last) K[p++] = M[last_value++];

        return true;
    }

    while (true) {
        V(K);
        if (!next_comb(0, k, 0, M.length)) break;
    }
}

Demo:

function print_it (K) { console.log(K); }

A([0, 0, 0, 0, 1, 1, 1, 2, 2, 3], 8, print_it);

// [0, 0, 0, 0, 1, 1, 1, 2]
// [0, 0, 0, 0, 1, 1, 1, 3]
// [0, 0, 0, 0, 1, 1, 2, 2]
// [0, 0, 0, 0, 1, 1, 2, 3]
// [0, 0, 0, 0, 1, 2, 2, 3]
// [0, 0, 0, 1, 1, 1, 2, 2]
// [0, 0, 0, 1, 1, 1, 2, 3]
// [0, 0, 0, 1, 1, 2, 2, 3]
// [0, 0, 1, 1, 1, 2, 2, 3]

A([0, 0, 0, 0, 1, 1, 1, 2, 2, 3], 8, print_it, [0, 0, 0, 0, 1, 2, 2, 3]);

// [0, 0, 0, 0, 1, 2, 2, 3]
// [0, 0, 0, 1, 1, 1, 2, 2]
// [0, 0, 0, 1, 1, 1, 2, 3]
// [0, 0, 0, 1, 1, 2, 2, 3]
// [0, 0, 1, 1, 1, 2, 2, 3]

This, of course, is not production-ready code. In particular, I've omitted all error-checking for the sake of readability. Furthermore, an implementation for production will probably structure things differently. (E.g. the option to specify the comparator used by next_combination's becomes superfluous here.) My main aim was to keep the ideas behind the original algorithm as clear as possible in a piece of functioning code.

If you don't mind C++, take a look at http://stackoverflow.com/a/30518940/1566221 — rici, Aug 21 '15 at 04:47
@rici: What a beauty. It made my day. I've translated it to JS, and added it as an UPDATE to my original post. I'm loathe to post it as my answer, since I don't want to take credit for it. Feel free to post it as such; I'll accept it. — kjo, Aug 21 '15 at 13:16

score 2 · Answer 1 · answered Aug 20 '15 at 13:37

I checked the relevant sections of TAoCP, but this problem is at most an exercise there. The basic idea is the same as Algorithm L: try to "increment" the least significant positions first, filling the positions after the successful increment to have their least allowed values.

Here's some Python that might work but is crying out for better data structures.

def increment(M, K):
    M = list(M)  # copy them
    K = list(K)
    for x in K:  # compute the difference
        M.remove(x)
    for i in range(len(K) - 1, -1, -1):
        candidates = [x for x in M if x > K[i]]
        if len(candidates) < len(K) - i:
            M.append(K[i])
            continue
        candidates.sort()
        K[i:] = candidates[:len(K) - i]
        return K
    return None


def demo():
    M = [0, 0, 1, 1, 2, 2, 3, 3]
    K = [0, 0, 1]
    while K is not None:
        print(K)
        K = increment(M, K)

score 1 · Answer 2 · answered Aug 20 '15 at 15:09

In iterative programming, to make combinations of K size you would need K for loops. First we remove the repetitions from the sorted input, then we create an array that represents the for..loop indices. While the indices array doesn't overflow we keep generating combinations.

The adder function simulates the pregression of counters in a stacked for loop. There is a little bit of room for improvement in the below implementation.

N = size of the distinct input
K = pick size
i = 0 To K - 1
for(var v_{i0} = i_{0}; v_{i} < N - (K - (i + 1)); v_{i}++) {
...
for(var v_{iK-1} = i_{K-1}; v_{iK-1} < N - (K - (i + 1)); v_{iK-1}++) {
  combo = [ array[v_{i0}] ... array[v_{iK-1}] ];
}
...
}

Here's the working source code in JavaScript

function adder(arr, max) {
  var k = arr.length;
  var n = max;
  var carry = false;
  var i;
  do {
  for(i = k - 1; i >= 0; i--) {
    arr[i]++;
    if(arr[i] < n - (k - (i + 1))) {
      break;
    }
    carry = true;
  }
  if(carry === true && i < 0) {
    return false; // overflow;
  }
  if(carry === false) { 
    return true;
  }
  carry = false;
  for(i = i + 1; i < k; i++) {
    arr[i] = arr[i - 1] + 1;
    if(arr[i] >= n - (k - (i + 1))) {
      carry = true;
    }
  }
  } while(carry === true);
  return true;
}
function nchoosekUniq(arr, k, cb) {
  // make the array a distinct set
  var set = new Set();
  for(var i=0; i < arr.length; i++) { set.add(arr[i]); }
  arr = [];
  set.forEach(function(v) { arr.push(v); });
  //
  var n = arr.length;
  // create index array
  var iArr = Array(k);
  for(var i=0; i < k; i++) { iArr[i] = i; }
  // find unique combinations;
  do {
    var combo = [];
    for(var i=0; i < iArr.length; i++) {
      combo.push(arr[iArr[i]]);
    }
    cb(combo);
  } while(adder(iArr, n) === true);
}
var arr = [0, 0, 1, 2]; 
var k = 2;
nchoosekUniq(arr, k, function(set) { 
  var s=""; 
  set.forEach(function(v) { s+=v; }); 
  console.log(s); 
}); // 01, 02, 12

Looking for non-recursive algorithm for visiting all k-combinations of a multiset in lexicographic order

2 Answers2