10

I have a set of N, for N>3, distinct integers and the problem is to find all distinct sums of 3-subsets of the given set. A 3-subset is a subset whose cardinality is 3.

I know that the dumb way would be to do a cubic search on all possible sums and then sort out all duplicates. Is there a more efficient way to do this? I am programming in C.

EDIT: I wanted to know a general faster algorithm if say the number of elements were increased.

Patricia Shanahan
  • 24,883
  • 2
  • 34
  • 68
  • 4
    If you only have 8 integers, a cubic search will be very fast. Why do you need something better for so few elements? – IVlad Jul 27 '13 at 14:13
  • @IVlad. I wanted to know a general solution, if there was something faster. – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 14:14
  • 1
    Not a dupe, but related: [Sum-subset with a fixed subset size](http://stackoverflow.com/q/8916539/572670). Note that finding all subsets sum of non-fixed size is NP-Complete [Correction: NP-Hard]. – amit Jul 27 '13 at 14:28
  • @amit I have no idea what NP-complete is, mon cher. I will look into this. – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 14:29
  • 7
    Close voters: What's wrong with the question? Algorithmic questions are perfectly fine and in-topic. Not only "How to parse a string in C?" questions fit for SO, these are perfectly fine as well, as it can be easily programmed in any programming language after the algorithm is established. – amit Jul 27 '13 at 14:30
  • @TorstenHĕrculĕCärlemän Correction: It is NP-Hard, no idea if also NP-Complete. NP-Hard are problems that we don't know any polynomial solution to, and the general assumption is that such a solution does not exist. – amit Jul 27 '13 at 14:31
  • @amit Sorry to bother you, but there exists a cubic time solution. Please correct me if I'm wrong. Just entering into algorithms. – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 14:32
  • @TorstenHĕrculĕCärlemän The cubic solution is for fixed size subset size, the general problem - w/o fixed size subset size, is NP-Hard. – amit Jul 27 '13 at 14:34
  • 1
    The naive solution is not exactly cubic time, but a bit harder: O(N^3 * log N): we need to sort out the duplicates, this is where O(N^3 * log (N^3)) appears. The question is whether we can make it in O(N^3) (as we certainly cannot make it faster than O(N^3), because O(N^3) is the number of 3-subsets). – nullptr Jul 27 '13 at 14:45
  • @Inspired Are you sure about the lower bound? I think we can make it greedy. – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 14:47
  • There is `O(nW)` solution to find the *number* of subsets, where `W` is the highest value in the set. Will you be interested in it? – amit Jul 27 '13 at 14:50
  • 3
    If you want to find all distinct sums of 3-subsets, than the complexity is at least a number of the distinct sums (because we need to find and output each of them), and this is O(N^3) in the worst case. If you want to find only amount of the distinct sums, that changes the problem (but I am yet unsure if this simplifies it). – nullptr Jul 27 '13 at 14:50
  • @amit I am interested in solving any way, mon cher ami! Please do! – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 14:51
  • The asymptotically more efficient approaches probably won't scale down very well. How large are the integers in question? – David Eisenstat Jul 27 '13 at 14:56
  • @amit if we had a P solution for all subsets of non-fixed size, we had a P solution for "give us the subset with the largest sum <= capacity" and we had solved the knapsack problem. Thus this problem must be NP-Complete as well. – Ingo Leonhardt Jul 27 '13 at 14:56
  • @IngoLeonhardt No, it means the problem is *NP-Hard*. To show it is NP-Complete you need to show it is also in NP, which I am not certain it is (didn't give much thought into it, tbh) – amit Jul 27 '13 at 14:57
  • @DavidEisenstat. Powers of four with the exponent going upto 10. – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 15:01
  • @TorstenHĕrculĕCärlemän Then all of the sums are distinct, and the answer is 8 choose 3. – David Eisenstat Jul 27 '13 at 15:05
  • @DavidEisenstat I think my question was a bit misleading, I was looking at a general solution. Thanks for your time :) – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 15:06
  • @TorstenHĕrculĕCärlemän Then edit the question to ask in general. – David Eisenstat Jul 27 '13 at 15:06
  • @amit, of course your're right, sorry, Although I've got the feeling that you could use knapsack the other way round as well, I can't figure it out – Ingo Leonhardt Jul 27 '13 at 15:11
  • In clarifying the question, please state the scaling for the subset cardinality. Solving for a fixed cardinality, such as 3, is a different problem from solving for a cardinality that can increase with the set size. – Patricia Shanahan Jul 27 '13 at 15:13
  • @PatriciaShanahan Could you please edit the question, I am typing this on my phone and it is inconvenient. The clarification you have mentioned is right. – Torsten Hĕrculĕ Cärlemän Jul 27 '13 at 15:16
  • I was asking which you meant (a) fixed size or (b) increasing with set size. If you tell me which, I'll edit the question to make it clear. – Patricia Shanahan Jul 27 '13 at 15:18
  • @Inspired you only need to remove dupes, not sort. A hash table will do that without the log. – n. 'pronouns' m. Jul 28 '13 at 06:11
  • can the 3-subset include two or more of the same number? – גלעד ברקן Jul 30 '13 at 01:49

2 Answers2

1

Using dynamic programming, you can find the number of distinct sums in O(n*MAX), where MAX is the maximal value in the array.

Let's look at the recursive function:

f(W,n,i) = f(W,n-1,i) OR (i != 0 ? f(W-item(n),n-1,i-1) : false)
f(0,0,0) = true
f(W,n,0) = false (W != 0)
f(W,0,i) = false (W != 0)
f(W,n,i) = false (W < 0)
(I have a feeling I forgot another failing base clause, so make sure if I didn't)

Now, if you build this bottom-up using Dynamic Programming, up to W=3*MAX, your answer is basically the number of different Ws that for them f(W,n,3) == true.

Building the table will be O(MAX*3 * n * 3) = O(MAX*n), the post-processing stage of counting the number of distinct Ws giving the desired sum is O(MAX), thus the solution remains O(MAX * n)

amit
  • 166,614
  • 24
  • 210
  • 314
  • A nice dynamic programming solution. This would also require O(MAX) of memory. This algorithm allows to find not only the number, but all the distinct sums as well. It's still unclear though whether O(n*MAX) is more efficient than O(n^3 * log n), it depends on `n` and MAX of course. – nullptr Jul 27 '13 at 15:02
  • Downvoter: Why the downvote? The OP said (in comments) that he is interested in a solution to find the *number* of distinct sums: `@amit I am interested in solving any way, mon cher ami! Please do! ` – amit Jul 28 '13 at 00:00
  • 1
    One has to be very careful when to use this approach because of sensitivity to MAX. It would be interesting if you could generalize to only work on the part of the input which is dense/small numbers, and then finish with a traditional approach to handle any large integers in the input. If the fraction of large integers is low, this should be a good compromise, because intuitively if the large integers are large enough and random then they'll each make a set of 3-subset sums with the 2-subset sums such that the 3-subset sum sets are nearly disjoint for different choices of large integer. – user2566092 Jul 28 '13 at 14:46
1

If you suspect that you may have lots of duplicate sums, then you can compute all distinct 2-subset sums first, and for each distinct 2-subset sum you find, keep track of which pair you have found that gave you the sum. If all your numbers are distinct, then if you ever find another pair that gives you the same sum, you should mark the sum as "multiple" and you can delete the pair you were storing for it if you like. Now you have a set of 2-subset sums, and each sum either has a single pair stored with it, or it is marked "multiple". For each 2-subset sum, if it's marked "multiple" then you iterate through all numbers in your original set and record all the 3-subset sums you can form by adding each number to your 2-subset sum. Otherwise, if the 2-subset sum is not marked "multiple" and you have a pair (a,b) associated with it, then you do the same thing except you skip a and b when you are iterating through your original set of numbers. This is how you get all distinct 3-subset sums. If you have n numbers and they make N distinct 2-subset sums, then the complexity of this approach is O(nN) if you use hash tables to detect duplicates at the two stages of the algorithm, which may be much better than the brute force O(n^3 log n), especially if you have a fairly dense set of integers.

user2566092
  • 4,473
  • 12
  • 19
  • Note this kind of solution can be even more beneficial if you are looking for 4-subset sums; in that case you can get a complexity of basically O(N^2) if you have N distinct 2-subset sums, compared to naive O(n^4 log n) if you have n numbers in your original set. – user2566092 Jul 27 '13 at 23:41
  • Thanks for the answer, even I am contemplating this. However, I still believe there is a quicker algorithm. – Torsten Hĕrculĕ Cärlemän Jul 28 '13 at 07:16
  • I agree, there should be some kind of compromise where you basically do a dynamic programming approach as the main step, but only for the "dense small-ish integers" part of your input, so you don't get crazy complexity blow-ups if you have a small handful of large integer outliers. However making this idea into a well-defined algorithm with provable good complexity for a well-defined type of input would be messy I think. – user2566092 Jul 28 '13 at 14:37
  • I think there is one point in your algorithm that I'm missing: Assume having the numbers 1,2,3,4. Then you would get the 2-subset sums 3 associated with (1,2) and 5 marked "multiple". How do now you prevent 6 to be calculated as 3+3 as well as 5+1? – Ingo Leonhardt Jul 29 '13 at 10:10