6

SO,

The problem

From SQL I'm getting an array with strings (flat array) - let it be

$rgData = ['foo', 'bar', 'baz', 'bee', 'feo'];

Now, I want to get possible combinations of pairs and triplets of this array (and, in common case, combinations of 4 elements e t.c.). To be more specific: I mean combinations in math sense (without duplicates), i.e. those, which count is equal to

enter image description here

-so for array above that will be 10 for both pairs and triplets.

My approach

I've started from mapping possible values for enter image description here to possible array selected items. My current solution is to point if an element is selected as "1", and "0" otherwise. For sample above that will be:

foo bar baz bee feo
 0   0   1   1   1   -> [baz, bee, feo]
 0   1   0   1   1   -> [bar, bee, feo]
 0   1   1   0   1   -> [bar, baz, feo]
 0   1   1   1   0   -> [bar, baz, bee]
 1   0   0   1   1   -> [foo, bee, feo]
 1   0   1   0   1   -> [foo, baz, feo]
 1   0   1   1   0   -> [foo, baz, bee]
 1   1   0   0   1   -> [foo, baz, feo]
 1   1   0   1   0   -> [foo, bar, bee]
 1   1   1   0   0   -> [foo, bar, baz]

And all I need to do is somehow produce desired bit set. Here's my code in PHP:

function nextAssoc($sAssoc)
{
   if(false !== ($iPos = strrpos($sAssoc, '01')))
   {
      $sAssoc[$iPos]   = '1';
      $sAssoc[$iPos+1] = '0';
      return substr($sAssoc, 0, $iPos+2).
             str_repeat('0', substr_count(substr($sAssoc, $iPos+2), '0')).
             str_repeat('1', substr_count(substr($sAssoc, $iPos+2), '1'));
   }
   return false;
}

function getAssoc(array $rgData, $iCount=2)
{
   if(count($rgData)<$iCount)
   {
      return null;
   }
   $sAssoc   = str_repeat('0', count($rgData)-$iCount).str_repeat('1', $iCount);
   $rgResult = [];
   do
   {
      $rgResult[]=array_intersect_key($rgData, array_filter(str_split($sAssoc)));
   }
   while($sAssoc=nextAssoc($sAssoc));
   return $rgResult;
}

-I've chosen to store my bits as a normal string. My algorithm for producing next association is:

  1. Try to find "01". If not found, then it's 11..100..0 case (so it's maximum, no more could be found). If found, go to second step
  2. Go to most right position of "01" in string. Switch it to "10" and then move all zeros that are righter than found "01" position - to left. For example, 01110: the most right position of "01" is 0, so first we switch this "01" to "10". String now sill be 10110. Now, go to right part (it's without 10 part, so it starts from 0+2=2-nd symbol), and move all zeros to left, i.e. 110 will be 011. As result, we have 10+011=10111 as next association for 01110.

I've found similar problem here - but there OP wants combinations with duplicates, while I want them without duplicated.

The question

My question is about two points:

  • For my solution, may be there's another way to produce next bit set more efficient?
  • May be there are more simple solutions for this? It seems to be standard problem.
Community
  • 1
  • 1
Alma Do
  • 35,363
  • 9
  • 65
  • 99
  • 1
    possible duplicate of [Algorithm to return all combinations of k elements from n](http://stackoverflow.com/questions/127704/algorithm-to-return-all-combinations-of-k-elements-from-n) – ElKamina Sep 23 '13 at 19:12
  • Wow, that looks interesting, thanks @ElKamina – Alma Do Sep 23 '13 at 21:11
  • I thought my answer was fine, with a good balance between clean PHP code and speed. Did you simply overlook it? – Walter Tross Oct 26 '13 at 20:22

3 Answers3

1

I'm sorry for not providing a PHP solution, because I didn't program in PHP for quite a long time now, but let me show you a quick Scala solution. Maybe it will inspire you:

val array = Vector("foo", "bar", "baz", "bee", "feo")
for (i <- 0 until array.size; 
     j <- i + 1 until array.size; 
     k <- j + 1 until array.size)      
    yield (array(i), array(j), array(k))

Result:

Vector((foo,bar,baz), (foo,bar,bee), (foo,bar,feo), (foo,baz,bee), (foo,baz,feo), (foo,bee,feo), (bar,baz,bee), (bar,baz,feo), (bar,bee,feo), (baz,bee,feo))

Universal code for generating k-combinations:

def combinations(array: Vector[String], k: Int, start: Int = 0): Iterable[List[String]] = { 
  if (k == 1 || start == array.length) 
    for (i <- start until array.length) yield List(array(i))
  else 
    for (i <- start until array.length; c <- combinations(array, k - 1, i + 1)) yield array(i) :: c 
}

Results:

scala> combinations(Vector("a", "b", "c", "d", "e"), 1)
res8: Iterable[List[String]] = Vector(List(a), List(b), List(c), List(d), List(e))

scala> combinations(Vector("a", "b", "c", "d", "e"), 2)
res9: Iterable[List[String]] = Vector(List(a, b), List(a, c), List(a, d), List(a, e), List(b, c), List(b, d), List(b, e), List(c, d), List(c, e), List(d, e))

scala> combinations(Vector("a", "b", "c", "d", "e"), 3)
res10: Iterable[List[String]] = Vector(List(a, b, c), List(a, b, d), List(a, b, e), List(a, c, d), List(a, c, e), List(a, d, e), List(b, c, d), List(b, c, e), List(b, d, e), List(c, d, e))

scala> combinations(Vector("a", "b", "c", "d", "e"), 4)
res11: Iterable[List[String]] = Vector(List(a, b, c, d), List(a, b, c, e), List(a, b, d, e), List(a, c, d, e), List(b, c, d, e))

scala> combinations(Vector("a", "b", "c", "d", "e"), 5)
res12: Iterable[List[String]] = Vector(List(a, b, c, d, e))

Of course, real scala code should be much more generic with regard to accepted type of elements and type of collections, but I just wanted to show the basic idea, not the most beautiful Scala code possible.

Piotr Kołaczkowski
  • 2,446
  • 8
  • 13
  • And what if I want to get combinations of 4 elements? Or 5? I can no do that without modifying your code. Common problem is to get `K` from `N` – Alma Do Sep 23 '13 at 16:36
  • Ok, I misunderstood you only needed pairs and triples. It is not much harder to get any k-combinations, but you need to use recursion then. – Piotr Kołaczkowski Sep 23 '13 at 17:53
  • Could you suggest your variant? (with recursion) – Alma Do Sep 23 '13 at 17:59
  • Thanks, @Piotr - I see it's a good suggestion. Now, I've found my answer in the link above my question – Alma Do Sep 24 '13 at 05:39
1

Here is a recursive solution:

function subcombi($arr, $arr_size, $count)
{
   $combi_arr = array();
   if ($count > 1) {
      for ($i = $count - 1; $i < $arr_size; $i++) {
         $highest_index_elem_arr = array($i => $arr[$i]);
         foreach (subcombi($arr, $i, $count - 1) as $subcombi_arr) {
            $combi_arr[] = $subcombi_arr + $highest_index_elem_arr;
         }
      }
   } else {
      for ($i = $count - 1; $i < $arr_size; $i++) {
         $combi_arr[] = array($i => $arr[$i]);
      }
   }
   return $combi_arr;
}

function combinations($arr, $count)
{
   if ( !(0 <= $count && $count <= count($arr))) {
      return false;
   }
   return $count ? subcombi($arr, count($arr), $count) : array();
}    

$input_arr = array('foo', 'bar', 'baz', 'bee', 'feo');
$combi_arr = combinations($input_arr, 3);
var_export($combi_arr); echo ";\n";

OUTPUT:

array (
  0 => 
  array (
    0 => 'foo',
    1 => 'bar',
    2 => 'baz',
  ),
  1 => 
  array (
    0 => 'foo',
    1 => 'bar',
    3 => 'bee',
  ),
  2 => 
  array (
    0 => 'foo',
    2 => 'baz',
    3 => 'bee',
  ),
  3 => 
  array (
    1 => 'bar',
    2 => 'baz',
    3 => 'bee',
  ),
  4 => 
  array (
    0 => 'foo',
    1 => 'bar',
    4 => 'feo',
  ),
  5 => 
  array (
    0 => 'foo',
    2 => 'baz',
    4 => 'feo',
  ),
  6 => 
  array (
    1 => 'bar',
    2 => 'baz',
    4 => 'feo',
  ),
  7 => 
  array (
    0 => 'foo',
    3 => 'bee',
    4 => 'feo',
  ),
  8 => 
  array (
    1 => 'bar',
    3 => 'bee',
    4 => 'feo',
  ),
  9 => 
  array (
    2 => 'baz',
    3 => 'bee',
    4 => 'feo',
  ),
);

The recursion is based on the fact that to get all combinations of k ($count) elements out of n ($arr_size) you must, for all possible choices of the highest zero-based index i, find all "subcombinations" of k-1 elements out of the remaining i elements with index lower than i.

The array is not array_sliced when it's passed to the recursive calls in order to take advantage of PHP's "lazy copy" mechanism. This way no real copying takes place, since the array is not modified.

Conserving array indices is nice for debugging purposes, but it's not necessary. Surprisingly, simply removing the $i => parts and replacing the array + with an array_merge causes a considerable slowdown. To attain a slightly better speed than the original version, you have to do this:

function subcombi($arr, $arr_size, $count)
{
   $combi_arr = array();
   if ($count > 1) {
      for ($i = $count - 1; $i < $arr_size; $i++) {
         $highest_index_elem = $arr[$i];
         foreach (subcombi($arr, $i, $count - 1) as $subcombi_arr) {
            $subcombi_arr[] = $highest_index_elem;
            $combi_arr[] = $subcombi_arr;
         }
      }
   } else {
      for ($i = $count - 1; $i < $arr_size; $i++) {
         $combi_arr[] = array($arr[$i]);
      }
   }
   return $combi_arr;
}


Regarding the first part of your question, you should avoid calculating the same quantity more than once, and you should minimize function calls. E.g., like this:
function nextAssoc($sAssoc)
{
   if (false !== ($iPos = strrpos($sAssoc, '01')))
   {
      $sAssoc[$iPos]   = '1';
      $sAssoc[$iPos+1] = '0';
      $tailPos = $iPos+2;
      $n0 = substr_count($sAssoc, '0', $tailPos);
      $n1 = strlen($sAssoc) - $tailPos - $n0;
      return substr($sAssoc, 0, $tailPos).str_repeat('0', $n0)
                                         .str_repeat('1', $n1);
   }
   return false;
}

It's hard to do deeper changes to your code without turning it inside out. It's not too bad though, since in my tests its speed is approximately half the one of my recursive solution (i.e., times are ca. double)

Walter Tross
  • 10,629
  • 2
  • 31
  • 59
  • Hi, Walter, Now I've took a look. Since my original solution is a 'minimal constructive' solution (i.e. I'm building exact binary projection and nothing excessive) - it could be improved only on language/expression level (as I see in your solution). So both solution have same big-O estimation, but, however, your could have better leading constant in it. Thank you. Also - as I recall, PHP pass only objects by reference by defaults, so may be it will be good to accept arrays as references in your functions to prevent copiyng to local function stack. – Alma Do Oct 27 '13 at 09:28
  • @AlmaDoMundo: The big-O can't be better than that, I fear. Regarding arrays, as I have explained in my answer, they are copied, but the copy is a "lazy copy" (see http://en.wikipedia.org/wiki/Object_copy#Lazy_copy), so that there is no need to pass them by reference as long as they are not written to, and it's actually better to pass them by value. – Walter Tross Oct 27 '13 at 10:32
  • I know - since we both have minimal constructive solution, algorithm itself can not be improved. I'm aware of 'lazy copy' (but more correct to name it 'copy on write', I think - as it is in PHP). I was not sure about handling passage to local stack - if it's also not copied, then you're right - there's no need to pass by reference. – Alma Do Oct 27 '13 at 10:53
  • @AlmaDoMundo: yes, the copy on the stack, as well as the copy to the return value, behaves just like any assignment copy. Anyway, I measured times with and without `&`, and I found no measurable difference. – Walter Tross Oct 27 '13 at 11:51
  • Yes - so do I (i.e. found no difference with testing) – Alma Do Oct 27 '13 at 12:06
1

I've just tried to solve this problem with minimum time complexity and without using recursion using go language.

I've seen a few solutions, but with using a recursive function. Avoiding recursion to solve stack size exceeded error.

package main

import "fmt"

func main() {
    // Arguments
    arr := []string{"foo", "bar", "baz", "bee", "feo", "boo", "bak"}
    combinations := make([][]string, 0)
    k := 4
    n := len(arr)

    // Execution starts from here
    if k > n {
        panic("invalid requirement")
    }

    pos := make([]int, k) // this variable is used to plot the unique combination of elements

    // initialize an array with first ever plotting possitions
    i := 0
    c := k
    for c > 0 {
        c--
        pos[i] = c
        i++
    }
    combinations = append(combinations, getCombination(arr, pos, k))

    // Let's begin the work
    x := 0
    ctr := 1 // counter is use to calculate total iterations
    for pos[x] < n-(x+1) {
        ctr++
        pos[x]++

        combinations = append(combinations, getCombination(arr, pos, k))

        if pos[x] == n-(x+1) && x+1 < k {
            x++
            i := x
            s := pos[x] + 1
            for i > 0 {
                i--
                s++
                pos[i] = s
            }

            // continue to next index
            continue
        }

        x = 0

    }

    fmt.Println("total # iterations: --> ", ctr)

    fmt.Println(combinations, "\ntotal # combinations: ", len(combinations))

}

func getCombination(arr []string, pos []int, k int) []string {
    combination := make([]string, k)
    for i, j := k-1, 0; i >= 0; i, j = i-1, j+1 {
        combination[j] = arr[pos[i]]
    }
    return combination
}

The working example is here https://play.golang.org/p/D6I5aq8685-

Roshan Gade
  • 192
  • 1
  • 13