0

I am looking for a least time-complex algorithm that would solve a variant of the perfect sum problem (initially: finding all variable size subset combinations from an array [*] of integers of size n that sum to a specific number x) where the subset combination size is of a fixed size k and return the possible combinations without direct and also indirect (when there's a combination containing the exact same elements from another in another order) duplicates.

I'm aware this problem is NP-hard, so I am not expecting a perfect general solution but something that could at least run in a reasonable time in my case, with n close to 1000 and k around 10

Things I have tried so far:

  • Finding a combination, then doing successive modifications on it and its modifications

    Let's assume I have an array such as:

s = [1,2,3,3,4,5,6,9]

So I have n = 8, and I'd like x = 10 for k = 3

I found thanks to some obscure method (bruteforce?) a subset [3,3,4]

From this subset I'm finding other possible combinations by taking two elements out of it and replacing them with other elements that sum the same, i.e. (3, 3) can be replaced by (1, 5) since both got the same sum and the replacing numbers are not already in use. So I obtain another subset [1,5,4], then I repeat the process for all the obtained subsets... indefinitely?

The main issue as suggested here is that it's hard to determine when it's done and this method is rather chaotic. I imagined some variants of this method but they really are work in progress

  • Iterating through the set to list all k long combinations that sum to x

Pretty self explanatory. This is a naive method that do not work well in my case since I have a pretty large n and a k that is not small enough to avoid a catastrophically big number of combinations (the magnitude of the number of combinations is 10^27!)

I experimented several mechanism related to setting an area of research instead of stupidly iterating through all possibilities, but it's rather complicated and still work in progress

What would you suggest? (Snippets can be in any language, but I prefer C++)

[*] To clear the doubt about whether or not the base collection can contain duplicates, I used the term "array" instead of "set" to be more precise. The collection can contain duplicate integers in my case and quite much, with 70 different integers for 1000 elements (counts rounded), for example

CodeTalker
  • 59
  • 6
  • sort your set; pick numbers in it while maintaining the current subset size and the target sum. update this sum on each pick by subtracting the picked element. when the current sum target is smaller than the next available element in the set, it's a failed branch. for the k=10 picks, this means creating k nested loops. do it with recursion, reacting to the success in the innermost invocation. – Will Ness Aug 04 '20 at 21:11
  • @WillNess Thank you for your answer, however I'm having trouble understanding some points. What "maintaining the current subset size and the target sum" means in this context? I'm also wondering why you posted it here in the comments – CodeTalker Aug 04 '20 at 21:37
  • *I am looking for a least time-complex C++ algorithm* -- Algorithms do not care what programming language they're written in. – PaulMcKenzie Aug 05 '20 at 03:56
  • Is `x` restricted by reasonable value? – MBo Aug 05 '20 at 04:50
  • @PaulMcKenzie I meant preferably if a snippet is provided – CodeTalker Aug 05 '20 at 07:40
  • @MBo `x` is a sum you can always obtain in my case, if it's your question – CodeTalker Aug 05 '20 at 07:43
  • @CodeTalke Problems with sum=100, 1000000 or 10^15 might be solved with different approaches. – MBo Aug 05 '20 at 07:46
  • @MBo Didn't imagine it could be, sorry. In my case, the sum can range from 0 (easily obtainable in this case, no specific algorithm needed) to 3000. I will update my question to be clearer – CodeTalker Aug 05 '20 at 08:01
  • [here](https://stackoverflow.com/a/49907365/849891)'s a related answer of mine. it's in Common Lisp but there's lots of verbiage and a pseudocode. implement the nested loops with recursion; add additional argument to the function, the current selection's sum, and use it to avoid computation that are sure to not lead to a solution. jump around on that answer's linked questions, find some more relevant answers by me, though none is in C/C++. cf. https://stackoverflow.com/a/34562122/849891 https://stackoverflow.com/q/62764261/f#comment111052017_62765370 https://stackoverflow.com/a/15179576/849891 – Will Ness Aug 05 '20 at 09:22
  • and [this](https://stackoverflow.com/questions/50086393/representing-an-amount-of-money-with-specific-bills/50087510#50087510) (if not already included in the above comment), although it's a pretty hairy Scheme code for a non-Schemer programmer to read... but still the discussion and links in it might be helpful. – Will Ness Aug 05 '20 at 09:26
  • [this](https://codereview.stackexchange.com/questions/225018/general-algorithm-to-calculate-sums-of-all-subsets-of-a-given-sequence-of-number) is also somewhat related. it's in C++. – Will Ness Aug 05 '20 at 09:33
  • What do you want to do after you find the combinations?? Store them?? I don't think that is realistic?? Find the largest one?? Find the smallest one?? Why do you need them?? Solve these problems first..... Most effecient way through a graph??? – Yunfei Chen Aug 05 '20 at 18:43

4 Answers4

1

With reasonable sum limit this problem might be solved using extension of dynamic programming approach for subset sum problem or coin change problem with predetermined number of coins. Note that we can count all variants in pseudopolynomial time O(x*n), but output size might grow exponentially, so generation of all variants might be a problem.

Make 3d array, list or vector with outer dimension x-1 for example: A[][][]. Every element A[p] of this list contains list of possible subsets with sum p.

We can walk through all elements (call current element item) of initial "set" (I noticed repeating elements in your example, so it is not true set).

Now scan A[] list from the last entry to the beginning. (This trick helps to avoid repeating usage of the same item).

If A[i - item] contains subsets with size < k, we can add all these subsets to A[i] appending item.

After full scan A[x] will contain subsets of size k and less, having sum x, and we can filter only those of size k

Example of output of my quick-made Delphi program for the next data:

Lst := [1,2,3,3,4,5,6,7];
k := 3;
sum := 10;

  3  3  4
  2  3  5  //distinct 3's
  2  3  5
  1  4  5
  1  3  6   
  1  3  6   //distinct 3's
  1  2  7

To exclude variants with distinct repeated elements (if needed), we can use non-first occurence only for subsets already containing the first occurence of item (so 3 3 4 will be valid while the second 2 3 5 won't be generated)

I literally translate my Delphi code into C++ (weird, I think :)

int main()
{
    vector<vector<vector<int>>> A;
    vector<int> Lst = { 1, 2, 3, 3, 4, 5, 6, 7 };

    int k = 3;
    int sum = 10;
    A.push_back({ {0} });  //fictive array to make non-empty variant
    for (int i = 0; i < sum; i++)
        A.push_back({{}});


    for (int item : Lst) {
        for (int i = sum; i >= item; i--) {
            for (int j = 0; j < A[i - item].size(); j++) 
                if (A[i - item][j].size() < k + 1  && 
                    A[i - item][j].size() > 0) {
                    vector<int> t = A[i - item][j];
                    t.push_back(item);
                    A[i].push_back(t);  //add new variant including current item
                }
        }
    }
         //output needed variants
    for (int i = 0; i < A[sum].size(); i++)
        if (A[sum][i].size() == k + 1) {
            for (int j  = 1; j < A[sum][i].size(); j++) //excluding fictive 0
                cout << A[sum][i][j] << " ";
        cout << endl;
    }
}
MBo
  • 66,413
  • 3
  • 45
  • 68
  • Could you please provide a link of what you refer to in the first paragraph? That would help understand your solution – CodeTalker Aug 10 '20 at 22:06
  • You can find general `subset sum problem` (without size limit) in numerous sources. Here I provide working Delphi code, hope it is understandable for you. – MBo Aug 11 '20 at 04:41
  • The provided snippet helps a lot, though I still didn't fully understand since the solution is quite hard to me to follow. Just a note: it seems to give incorrect results when the array contains several duplicates (tested against `vector Lst = { 1, 2, 2, 3, 3, 4, 5, 6, 6, 7 };`) – CodeTalker Aug 11 '20 at 16:15
1

Here is a complete solution in Python. Translation to C++ is left to the reader.

Like the usual subset sum, generation of the doubly linked summary of the solutions is pseudo-polynomial. It is O(count_values * distinct_sums * depths_of_sums). However actually iterating through them can be exponential. But using generators the way I did avoids using a lot of memory to generate that list, even if it can take a long time to run.

from collections import namedtuple
# This is a doubly linked list.
# (value, tail) will be one group of solutions.  (next_answer) is another.
SumPath = namedtuple('SumPath', 'value tail next_answer')

def fixed_sum_paths (array, target, count):
    # First find counts of values to handle duplications.
    value_repeats = {}
    for value in array:
        if value in value_repeats:
            value_repeats[value] += 1
        else:
            value_repeats[value] = 1

    # paths[depth][x] will be all subsets of size depth that sum to x.
    paths = [{} for i in range(count+1)]

    # First we add the empty set.
    paths[0][0] = SumPath(value=None, tail=None, next_answer=None)

    # Now we start adding values to it.
    for value, repeats in value_repeats.items():
        # Reversed depth avoids seeing paths we will find using this value.
        for depth in reversed(range(len(paths))):
            for result, path in paths[depth].items():
                for i in range(1, repeats+1):
                    if count < i + depth:
                        # Do not fill in too deep.
                        break
                    result += value
                    if result in paths[depth+i]:
                        path = SumPath(
                            value=value,
                            tail=path,
                            next_answer=paths[depth+i][result]
                            )
                    else:
                        path = SumPath(
                            value=value,
                            tail=path,
                            next_answer=None
                            )
                    paths[depth+i][result] = path

                    # Subtle bug fix, a path for value, value
                    # should not lead to value, other_value because
                    # we already inserted that first.
                    path = SumPath(
                        value=value,
                        tail=path.tail,
                        next_answer=None
                        )
    return paths[count][target]

def path_iter(paths):
    if paths.value is None:
        # We are the tail
        yield []
    else:
        while paths is not None:
            value = paths.value
            for answer in path_iter(paths.tail):
                answer.append(value)
                yield answer
            paths = paths.next_answer

def fixed_sums (array, target, count):
    paths = fixed_sum_paths(array, target, count)
    return path_iter(paths)

for path in fixed_sums([1,2,3,3,4,5,6,9], 10, 3):
    print(path)

Incidentally for your example, here are the solutions:

[1, 3, 6]
[1, 4, 5]
[2, 3, 5]
[3, 3, 4]
btilly
  • 35,214
  • 3
  • 46
  • 74
  • The approach of generating the solutions on the go instead of storing them is exactly what I'm looking for. Do you think implementing duplicate filtering with this solution would be possible? – CodeTalker Aug 11 '20 at 17:48
  • @CodeTalker This code already does duplicate filtering. Note that `[1, 3, 6]` only occurs once even though there are 2 3s available. – btilly Aug 11 '20 at 18:27
  • That said, I had a bug in the duplicate filtering. Fixing now. – btilly Aug 11 '20 at 18:45
  • Was about to mention it. Thank you! – CodeTalker Aug 11 '20 at 19:05
0

You should first sort the so called array. Secondly, you should determine if the problem is actually solvable, to save time... So what you do is you take the last k elements and see if the sum of those is larger or equal to the x value, if it is smaller, you are done it is not possible to do something like that.... If it is actually equal yes you are also done there is no other permutations.... O(n) feels nice doesn't it?? If it is larger, than you got a lot of work to do..... You need to store all the permutations in an seperate array.... Then you go ahead and replace the smallest of the k numbers with the smallest element in the array.... If this is still larger than x then you do it for the second and third and so on until you get something smaller than x. Once you reach a point where you have the sum smaller than x, you can go ahead and start to increase the value of the last position you stopped at until you hit x.... Once you hit x that is your combination.... Then you can go ahead and get the previous element so if you had 1,1,5, 6 in your thingy, you can go ahead and grab the 1 as well, add it to your smallest element, 5 to get 6, next you check, can you write this number 6 as a combination of two values, you stop once you hit the value.... Then you can repeat for the others as well.... You problem can be solved in O(n!) time in the worst case.... I would not suggest that you 10^27 combinations, meaning you have more than 10^27 elements, mhmmm bad idea do you even have that much space??? That's like 3bits for the header and 8 bits for each integer you would need 9.8765*10^25 terabytes just to store that clossal array, more memory than a supercomputer, you should worry about whether your computer can even store this monster rather than if you can solve the problem, that many combinations even if you find a quadratic solution it would crash your computer, and you know what quadratic is a long way off from O(n!)...

Yunfei Chen
  • 548
  • 5
  • 15
  • "I would not suggest that you 10^27 combinations, meaning you have more than 10^27 elements". If your answer is based on storing the k elements combinations you can get from a n elements array, then I fear it will be problematic – CodeTalker Aug 04 '20 at 22:41
  • "You need to store all the permutations in an seperate array.... Then you go ahead and replace the smallest of the k numbers with the smallest element in the array." Since we're talking about permutations, this array's elements are just the same elements with a different sorting? So how replacing the smallest number of each element will help? It will remain the same thing with a different sorting, again – CodeTalker Aug 04 '20 at 22:45
  • "So how replacing the smallest number of each element will help? It will remain the same thing with a different sorting, again –" That is not what I said.... I said replace the smallest of the k elements, (The elements you need) with the smallest one, so if you need three, take the three largest and see if they exceed it if they do replace smallest one with the smallest element in the array, again you can keep doing this.... also you only have these k elements, so think recursive, .... – Yunfei Chen Aug 05 '20 at 18:35
  • Secondly the 10^27 elements, meh huh, you wanna print them all out?? Cause that is what the question suggests?? You have any idea how long that's gonna take?? If you started printing 10^27 without doing any calculations it will take until next year.... If you tried to find 10^27 combinations, well come back when you are 10 years older.... Perhaps you have a hard time comperhanding how large 10^27 is?? If you had that many drops of water you can fill all a mass of all the oceans and icebergs in the solar system, including mars moon etc.... – Yunfei Chen Aug 05 '20 at 18:39
  • I never mentioned that I want to print all the combinations of size k of the array, and can't see what in my question suggests that – CodeTalker Aug 09 '20 at 20:49
  • @CodeTalker what do you want to do with the permuatations then?? If the answer is nothing you are better off not doing it, most effecient way to solve problems is get rid of things you do not need.... – Yunfei Chen Aug 10 '20 at 21:56
  • I think you did not fully understand my question. It is about obtaining all the combinations of length `k` that can be obtained from an array of size `n` **and sum to a number `x`**. And FYI, permutations != combinations – CodeTalker Aug 10 '20 at 22:01
  • @CodeTalker You need to be specfic why do you want the permuations?? – Yunfei Chen Aug 10 '20 at 22:02
0

A brute force method using recursion might look like this...

For example, given variables set, x, k, the following pseudo code might work:

setSumStructure find(int[] set, int x, int k, int setIdx)
{
   int sz = set.length - setIdx;
   if (sz < x) return null;
   if (sz == x) check sum of set[setIdx] -> set[set.size] == k.  if it does, return the set together with the sum, else return null;
   
   for (int i = setIdx; i < set.size - (k - 1); i++)
      filter(find (set, x - set[i], k - 1, i + 1));

   return filteredSets;
}
John
  • 629
  • 3
  • 11
  • Isn't bruteforcing not recommended with a large `n` (array size) value? – CodeTalker Aug 10 '20 at 22:16
  • Yes, brute forcing is not ideal - this is a simple solution which can handle n = 1000. filter removes the nulls from the output and creates a list on the heap. It must be heap and not stack since recursion can be to 1000 depth, so saving stack space is critical. – John Aug 11 '20 at 06:01