Sum-subset with a fixed subset size

Question

Given a set of integers, is there a non-empty subset whose sum is zero?

This problem is NP-complete in general. I'm curious if the complexity of this slight variant is known:

Given a set of integers, is there a subset of size k whose sum is zero?

For example, if k = 1, you can do a binary search to find the answer in O(log n). If k = 2, then you can get it down to O(n log n) (e.g. see Find a pair of elements from an array whose sum equals a given number). If k = 3, then you can do O(n^2) (e.g. see Finding three elements in an array whose sum is closest to a given number).

Is there a known bound that can be placed on this problem as a function of k?

As motivation, I was thinking about this question How do you partition an array into 2 parts such that the two parts have equal average? and trying to determine if it is actually NP-complete. The answer lies in whether or not there is a formula as described above.

Barring a general solution, I'd be very interested in knowing an optimal bound for k=4.

Technically for `k=1` the lower bound would be `O(n)` (you cannot assume sorted input) — awesomo, Jan 18 '12 at 20:45
@awesomo Sure, if you like, but assuming the input is sorted doesn't change the problem much. — PengOne, Jan 18 '12 at 20:47
see also http://stackoverflow.com/questions/3684243/picking-five-numbers-that-sum-to-s/3687124 — sdcvvc, Jan 21 '13 at 07:34

Evgeny Kluev · Answer 1 · 2012-01-19T14:31:21.460

16

For k=4, space complexity O(n), time complexity O(n² * log(n))

Sort the array. Starting from 2 smallest and 2 largest elements, calculate all lesser sums of 2 elements (a[i] + a[j]) in the non-decreasing order and all greater sums of 2 elements (a[k] + a[l]) in the non-increasing order. Increase lesser sum if total sum is less than zero, decrease greater one if total sum is greater than zero, stop when total sum is zero (success) or a[i] + a[j] > a[k] + a[l] (failure).

The trick is to iterate through all the indexes i and j in such a way, that (a[i] + a[j]) will never decrease. And for k and l, (a[k] + a[l]) should never increase. A priority queue helps to do this:

Put key=(a[i] + a[j]), value=(i = 0, j = 1) to priority queue.
Pop (sum, i, j) from priority queue.
Use sum in the above algorithm.
Put (a[i+1] + a[j]), i+1, j and (a[i] + a[j+1]), i, j+1 to priority queue only if these elements were not already used. To keep track of used elements, maintain an array of maximal used 'j' for each 'i'. It is enough to use only values for 'j', that are greater, than 'i'.
Continue from step 2.

For k>4

If space complexity is limited to O(n), I cannot find anything better, than use brute force for k-4 values and the above algorithm for the remaining 4 values. Time complexity O(n^(k-2) * log(n)).

For very large k integer linear programming may give some improvement.

Update

If n is very large (on the same order as maximum integer value), it is possible to implement O(1) priority queue, improving complexities to O(n²) and O(n^(k-2)).

If n >= k * INT_MAX, different algorithm with O(n) space complexity is possible. Precalculate a bitset for all possible sums of k/2 values. And use it to check sums of other k/2 values. Time complexity is O(n^(ceil(k/2))).

edited Jan 19 '12 at 14:31

answered Jan 19 '12 at 13:03

Evgeny Kluev

23,617
7
50
90

1

This answer is based on the ideas by Gina and ElKamina. – Evgeny Kluev Jan 19 '12 at 13:04
Why not use the same trick for `k>4` ? E.g. for `k=6`, increase the lower `a[i]+a[j]+a[k]` and decrease the higher `a[l]+a[m]+a[n]` until meeting? – mitchus Feb 29 '12 at 20:14
@mitchus, this trick is possible for `k>4`, but it requires superlinear space, for example, for `k=6`, the priority queue would contain O(n^2) elements. As you can see in comments for some other posts, OP doesn't want solutions with superlinear space requirement. – Evgeny Kluev Mar 01 '12 at 09:08
I see. Perhaps the OP should add this to the original post then :) – mitchus Mar 01 '12 at 09:46
You mention brute force for k> 4. Could you elaborate what brute force approach you refer to? Thanks – Bober02 Mar 23 '12 at 13:36
@Bober02, by "brute force" I mean systematically check all possible permutations of `k-4` array indexes (and applying mentioned algorithm to the remaining 4 indexes). You can see an example of 1-index brute force and 2-index optimal algorithm [here](http://stackoverflow.com/a/2070509/1009831). – Evgeny Kluev Mar 23 '12 at 13:51
This is an interesting approach. But I don't see why the space complexity is O(n). Since you are potentially increasing the size of the priority queue by 1 in each iteration, couldn't it grow to contain O(n^2) elements ? – krjampani Oct 12 '12 at 18:31
@krjampani: priority queue cannot have more than 2*n elements. If it contains one or more elements for some 'i', and index 'j' for smallest of them is equal to R, then the largest index 'j' for 'i+1' cannot be larger than R. In other words, all elements in the queue are on the border of some 2D area, represented by elements, taken out of the queue. Other modification of this algorithm, that is easier to prove: initially fill the queue with elements, having any i and j=0; after processing element (i,j), put (i,j+1) to the queue - here queue never has more than n elements. – Evgeny Kluev Oct 12 '12 at 19:51
That makes sense. Thanks! +1. – krjampani Oct 12 '12 at 21:14
Do I need to use 2 priority queues to keep sums for `lesser` and `greater`? And should I sum all elements in sorted array or just a `1..n/2` for `lesser` and `n/2..n` for `greater`? – rdo Jan 10 '15 at 07:41
@rdo: You need two separate priority queues. And you should sum all elements, not just first half and second half. But usually (when values are uniformly distributed) "lesser" queue does not need too many elements from "greater" half and "greater" queue does not use too many elements from "lesser" half. – Evgeny Kluev Jan 10 '15 at 09:52
Shouldn't the time complexity be something like O(n!/(k!(n-k)!))? Order isn't important, nor repetition necessarily. – J. Linne Jan 19 '19 at 16:46

score 4 · Answer 2 · answered Jan 19 '12 at 04:44

The problem of determining whether 0 in W + X + Y + Z = {w + x + y + z | w in W, x in X, y in Y, z in Z} is basically the same except for not having annoying degenerate cases (i.e., the problems are inter-reducible with minimal resources).

This problem (and thus the original for k = 4) has an O(n^2 log n)-time, O(n)-space algorithm. The O(n log n)-time algorithm for k = 2 (to determine whether 0 in A + B) accesses A in sorted order and B in reverse sorted order. Thus all we need is an O(n)-space iterator for A = W + X, which can be reused symmetrically for B = Y + Z. Let W = {w1, ..., wn} in sorted order. For all x in X, insert a key-value item (w1 + x, (1, x)) into a priority queue. Repeatedly remove the min element (wi + x, (i, x)) and insert (wi+1 + x, (i+1, x)).

ElKamina · Answer 3 · 2012-01-19T00:22:30.723

The solution for k=4 in O(n^2log(n))

Step 1: Calculate the pairwise sum and sort the list. There are n(n-1)/2 sums. So the complexity is O(n^2log(n)). Keep the identities of the individuals which make the sum.

Step 2: For each element in the above list search for the complement and make sure they don't share "the individuals). There are n^2 searches, each with complexity O(log(n))

EDIT: The space complexity of the original algorithm is O(n^2). The space complexity can be reduced to O(1) by simulating a virtual 2D matrix (O(n), if you consider space to store sorted version of the array).

First about 2D matrix: sort the numbers and create a matrix X using pairwise sums. Now the matrix is ins such a way that all the rows and columns are sorted. To search for a value in this matrix, search the numbers on the diagonal. If the number is in between X[i,i] and X[i+1,i+1], you can basically halve the search space by to matrices X[i:N, 0:i] and X[0:i, i:N]. The resulting search algorithm is O(log^2n) (I AM NOT VERY SURE. CAN SOMEBODY CHECK IT?).

Now, instead of using a real matrix, use a virtual matrix where X[i,j] are calculated as needed instead of pre-computing them.

Resulting time complexity: O( (nlogn)^2 ).

PS: In the following link, it says the complexity of 2D sorted matrix search is O(n) complexity. If that is true (i.e. O(log^2n) is incorrect), then the finally complexity is O(n^3).

Sorry, I should have mentioned that I don't want to use more than `O(n)` space (preferably `O(1)`). — PengOne, Jan 18 '12 at 22:45
In step 2, how can we make sure they don't share the individuals? I mean they don't have an element in common? How can I check that in Java? — Hengameh, Jul 11 '15 at 01:50

score 2 · Answer 4 · edited May 23 '17 at 12:25

2

Question that is very similar:

Is this variant of the subset sum problem easier to solve?

It's still NP-complete.

If it were not, the subset-sum would also be in P, as it could be represented as F(1) | F(2) | ... F(n) where F is your function. This would have O(O(F(1)) + O(F(2)) + O(F(n))) which would still be polynomial, which is incorrect as we know it's NP-complete.

Note that if you have certain bounds on the inputs you can achieve polynomial time.

Also note that the brute-force runtime can be calculated with binomial coefficients.

edited May 23 '17 at 12:25

Community

1
1

answered Jan 18 '12 at 21:40

Pubby

48,511
12
121
172

4

For fixed k, the problem "Is there a k-subset which has a given sum" can be solved in polynomial time for any k. The algorithm is trivial: check all subsets of size k, of which there are O(n^k). Not sure whether I'm misunderstanding you or not. – Patrick87 Jan 18 '12 at 21:50
@Patrick87 Perhaps I'm wrong, but aren't there (N K) subsets to check naively where (N K) is a binomial coefficient? n^k makes no sense to me. – Pubby Jan 18 '12 at 21:59
2

Yes, there are C(n, k) subsets of size k, and C(n, k) is O(n^k). I mean, the number of k-tuples is P(n, k), which is greater than C(n, k), and the number of ways to choose k from n with repetition is n^k, which is greater than P(n, k). – Patrick87 Jan 18 '12 at 22:06
@Patrick87 Still not sure I follow. Could you write an answer? – Pubby Jan 18 '12 at 22:16
From your logic, it follows that the number of subsets of `{1,2,...,n}` is polynomial in `n`, since the number of subsets of size `k` is polynomial in `n`. The problem is that you are not adding a fixed number of cases... the number of cases depends on `n`. Not only is it theoretically possible to give a polynomial algorithm for fixed `k`, but @Patrick87 has given a polynomial upper bound. – PengOne Jan 18 '12 at 23:03
@Patrick87, the OP asked "Is there a known bound that can be placed on this problem as a function of k?" so O(n^k) is not polynomial. – Neowizard Feb 28 '12 at 18:36
1

@Neowizard It is polynomial in n, and n^k is a function of k. I would agree that n^k is not polynomial in k, but that's not what I took the original question to mean; I was involved in the question which gave rise to PengOne's asking this question. If you see PengOne's comment to Pubby, you'll see that PengOne agrees with my interpretation; since he's asking the question, I'd say that makes my interpretation the correct one. His question is whether you can do better for fixed k than O(n^k). For small, specific k, the answer is yes. – Patrick87 Feb 28 '12 at 21:05

Patrick87 · Answer 5 · 2012-01-19T16:36:08.967

To build on awesomo's answer... if we can assume that numbers are sorted, we can do better than O(n^k) for given k; simply take all O(n^(k-1)) subsets of size (k-1), then do a binary search in what remains for a number that, when added to the first (k-1), gives the target. This is O(n^(k-1) log n). This means the complexity is certainly less than that.

In fact, if we know that the complexity is O(n^2) for k=3, we can do even better for k > 3: choose all (k-3)-subsets, of which there are O(n^(k-3)), and then solve the problem in O(n^2) on the remaining elements. This is O(n^(k-1)) for k >= 3.

However, maybe you can do even better? I'll think about this one.

EDIT: I was initially going to add a lot proposing a different take on this problem, but I've decided to post an abridged version. I encourage other posters to see whether they believe this idea has any merit. The analysis is tough, but it might just be crazy enough to work.

We can use the fact that we have a fixed k, and that sums of odd and even numbers behave in certain ways, to define a recursive algorithm to solve this problem.

First, modify the problem so that you have both even and odd numbers in the list (this can be accomplished by dividing by two if all are even, or by subtracting 1 from numbers and k from the target sum if all are odd, and repeating as necessary).

Next, use the fact that even target sums can be reached only by using an even number of odd numbers, and odd target sums can be reached using only an odd number of odd numbers. Generate appropriate subsets of the odd numbers, and call the algorithm recursively using the even numbers, the sum minus the sum of the subset of odd numbers being examined, and k minus the size of the subset of odd numbers. When k = 1, do binary search. If ever k > n (not sure this can happen), return false.

If you have very few odd numbers, this could allow you to very quickly pick up terms that must be part of a winning subset, or discard ones that cannot. You can transform problems with lots of even numbers to equivalent problems with lots of odd numbers by using the subtraction trick. The worst case must therefore be when the numbers of even and odd numbers are very similar... and that's where I am right now. A uselessly loose upper bound on this is many orders of magnitudes worse than brute-force, but I feel like this is probably at least as good as brute-force. Thoughts are welcome!

EDIT2: An example of the above, for illustration.

{1, 2, 2, 6, 7, 7, 20}, k = 3, sum = 20.
Subset {}:
 {2, 2, 6, 20}, k = 3, sum = 20
 = {1, 1, 3, 10}, k = 3, sum = 10
 Subset {}:
  {10}, k = 3, sum = 10
  Failure
 Subset {1, 1}:
  {10}, k = 1, sum = 8
  Failure
 Subset {1, 3}:
  {10}, k = 1, sum = 6
  Failure
Subset {1, 7}:
 {2, 2, 6, 20}, k = 1, sum = 12
 Failure
Subset {7, 7}:
 {2, 2, 6, 20}, k = 1, sum = 6
 Success

In lieu of a more general answer, this is the best of the lot at the time of the bounty expiration, so the rep goes to... — PengOne, Mar 01 '12 at 06:27

score 0 · Answer 6 · answered Jan 18 '12 at 20:16

0

The time complexity is trivially O(n^k) (number of k-sized subsets from n elements).

Since k is a given constant, a (possibly quite high-order) polynomial upper bounds the complexity as a function of n.

answered Jan 18 '12 at 20:16

awesomo

7,430
2
18
23

True, but all three examples I've given have better bounds than this. I suppose I'm more interested in how the bound grows with `k`, so a tighter bound is better. – PengOne Jan 18 '12 at 20:39
To the anonymous downvoter, please prove me wrong. Note that Big-Oh is an upper bound, I never claimed my answer to be a tight, Big-Omega bound. – awesomo Jan 18 '12 at 22:45
3

@awesomo Your answer is right, but not useful! It is trivial. – ElKamina Jan 19 '12 at 00:24

Sum-subset with a fixed subset size

6 Answers6

Linked

Related