0

I want to have all combination of elements in a list for a result like this:

List: {1,2,3}

1
2
3
1,2
1,3
2,3

My problem is that I have 180 elements, and I want to have all combinations up to 5 elements. With my tests with 4 elements, it took a long time (2 minutes) but all went well. But with 5 elements, I get a run out of memory exception.

My code presently is this:

public IEnumerable<IEnumerable<Rondin>> getPossibilites(List<Rondin> rondins)
{
    var combin5 = rondins.Combinations(5);
    var combin4 = rondins.Combinations(4);
    var combin3 = rondins.Combinations(3);
    var combin2 = rondins.Combinations(2);
    var combin1 = rondins.Combinations(1);

    return combin5.Concat(combin4).Concat(combin3).Concat(combin2).Concat(combin1).ToList();
}

With the fonction: (taken from this question: Algorithm to return all combinations of k elements from n)

public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
    return k == 0 ? new[] { new T[0] } :
    elements.SelectMany((e, i) =>
    elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] { e }).Concat(c)));
}

I need to search in the list for a combination where each element added up is near (with a certain precision) to a value, this for each element in an other list. There is all my code for this part:

var possibilites = getPossibilites(opt.rondins);

possibilites = possibilites.Where(p => p.Sum(r => r.longueur + traitScie) < 144);

foreach(BilleOptimisee b in opt.billesOptimisees)
{
    var proches = possibilites.Where(p => p.Sum(r => (r.longueur + traitScie)) < b.chute && Math.Abs(b.chute - p.Sum(r => r.longueur)) - (p.Count() * 0.22) < 0.01).OrderByDescending(p => p.Sum(r => r.longueur)).ElementAt(0);

    if(proches != null)
    {
        foreach (Rondin r in proches)
        {
            opt.rondins.Remove(r);
            b.rondins.Add(r);
            possibilites = possibilites.Where(p => !p.Contains(r));
        }
    }
}

With the code I have, how can I limit the memory taken by my list ? Or is there a better solution to search in a very big set of combinations ?

Please, if my question is not good, tell me why and I will do my best to learn and ask better questions next time ;)

Community
  • 1
  • 1
Eradash
  • 352
  • 6
  • 14
  • The number of combinations is: `180^5 + 180^4 + 180^3 + 180^2 + 180 = 190,012,424,580`. This is a lot of combinations and you will never be able to store them in memory (unless you have TBytes of RAM). You can only iterate over the combinations (by generating them dynamically). But this will probably slow down your algorithm even more (depending on how many times you have to iterate the list). You can do this by removing the `.ToList()` – P. Kouvarakis Jul 18 '16 at 16:09
  • @P.Kouvarakis: The actual number of combinations is 1,532,116,491, which is considerably smaller than the amount you suggest, but still huge. Do you see where you made your mistake? (Hint: is 1, 1, 1, 1, 1 a possible combination? are 1, 2, 3, 4, 5 and 5, 4, 3, 2, 1 different combinations?) – Eric Lippert Jul 18 '16 at 16:33
  • Re: "I need to search in the list for a combination where each element added up is near (with a certain precision) to a value, this for each element in an other list." Your problem has been studied by mathematicians for over 100 years; it is a variation on the "knapsack problem". A web search should turn up some papers on ways you can improve the efficiency of your program, but as P.Kouvarakis notes, simply not ever calling ToList will be a big help. There's no need to have all the combinations in memory at once. – Eric Lippert Jul 18 '16 at 16:36
  • My list is not only integers, it's a list of objects that contains a double between 5 and 90. So a combination with two object of value 90 need to be excluded from the final list. I cannot have the same object twice in one combination. – Eradash Jul 18 '16 at 16:38
  • I need to take the value nearest, so I must know each value to take a decision on witch objects take for each iteration, with the precedent objects excluded each iteration – Eradash Jul 18 '16 at 16:40

3 Answers3

2

Your output list for combinations of 5 elements will have ~1.5*10^9 (that's billion with b) sublists of size 5. If you use 32bit integers, even neglecting lists overhead and assuming you have a perfect list with 0b overhead - that will be ~200GB!

You should reconsider if you actually need to generate the list like you do, some alternative might be: streaming the list of elements - i.e. generating them on the fly.

That can be done by creating a function, which gets the last combination as an argument - and outputs the next. (to think how it is done, think about increasing by one a number. you go from last to first, remembering a "carry over" until you are done)

A streaming example for choosing 2 out of 4:

start: {4,3}
curr = start {4, 3}
curr = next(curr) {4, 2} // reduce last by one 
curr = next(curr) {4, 1} // reduce last by one
curr = next(curr) {3, 2} // cannot reduce more, reduce the first by one, and set the follower to maximal possible value
curr = next(curr) {3, 1} // reduce last by one
curr = next(curr) {2, 1} // similar to {3,2}
done.

Now, you need to figure how to do it for lists of size 2, then generalize it for arbitrary size - and program your streaming combination generator.

Good Luck!

amit
  • 166,614
  • 24
  • 210
  • 314
  • You could use vectors of indices into the original array, which would fit into five bytes each (since the indices are in the range 0..179), reducing the storage requirement to "only" 7.13GiB. – rici Jul 18 '16 at 17:11
0

Let your precision be defined in the imaginary spectrum.

Use a real index to access the leaf and then traverse the leaf with the required precision.

See PrecisLise @ http://net7mma.codeplex.com/SourceControl/latest#Common/Collections/Generic/PrecicseList.cs

While the implementation is not 100% complete as linked you can find where I used a similar concept here:

http://net7mma.codeplex.com/SourceControl/latest#RtspServer/MediaTypes/RFC6184Media.cs

Using this concept I was able to re-order h.264 Access Units and their underlying Network Access Layer Components in what I consider a very interesting way... outside of interesting it also has the potential to be more efficient using close the same amount of memory.

et al, e.g, 0 can be proceeded by 0.1 or 0.01 or 0.001, depending on the type of the key in the list (double, float, Vector, inter alia) you may have the added benefit of using the FPU or even possibly Intrinsics if supported by your processor, thus making sorting and indexing much faster than would be possible on normal sets regardless of the underlying storage mechanism.

Using this concept allows for very interesting ordering... especially if you provide a mechanism to filter the precision.

I was also able to find several bugs in the bit-stream parser of quite a few well known media libraries using this methodology...

Jay
  • 3,017
  • 1
  • 22
  • 37
0

I found my solution, I'm writing it here so that other people that has a similar problem than me can have something to work with...

I made a recursive fonction that check for a fixed amount of possibilities that fit the conditions. When the amount of possibilities is found, I return the list of possibilities, do some calculations with the results, and I can restart the process. I added a timer to stop the research when it takes too long. Since my condition is based on the sum of the elements, I do every possibilities with distinct values, and search for a small amount of possibilities each time (like 1).

So the fonction return a possibility with a very high precision, I do what I need to do with this possibility, I remove the elements of the original list, and recall the fontion with the same precision, until there is nothing returned, so I can continue with an other precision. When many precisions are done, there is only about 30 elements in my list, so I can call for all the possibilities (that still fits the maximum sum), and this part is much easier than the beginning.

There is my code:

public List<IEnumerable<Rondin>> getPossibilites(IEnumerable<Rondin> rondins, int nbElements, double minimum, double maximum, int instance = 0, double longueur = 0)
{
    if(instance == 0)
        timer = DateTime.Now;

    List<IEnumerable<Rondin>> liste = new List<IEnumerable<Rondin>>();

    //Get all distinct rondins that can fit into the maximal length
    foreach (Rondin r in rondins.Where(r => r.longueur < (maximum - longueur)).DistinctBy(r => r.longueur).OrderBy(r => r.longueur))
    {
        //Check the current length
        double longueur2 = longueur + r.longueur + traitScie;

        //If the current length is under the maximal length
        if (longueur2 < maximum)
        {
            //Get all the possibilities with all rondins except the current one, and add them to the list
            foreach (IEnumerable<Rondin> poss in getPossibilites(rondins.Where(rondin => rondin.id != r.id), nbElements - liste.Count, minimum, maximum, instance + 1, longueur2).Select(possibilite => possibilite.Concat(new Rondin[] { r })))
            {
                liste.Add(poss);
                if (liste.Count >= nbElements && nbElements > 0)
                    break;
            }

            //If this the current length in higher than the minimum, add it to the list
            if (longueur2 >= minimum)
                liste.Add(new Rondin[] { r });
        }

        //If we have enough possibilities, we stop the research
        if (liste.Count >= nbElements && nbElements > 0)
            break;

        //If the research is taking too long, stop the research and return the list;
        if (DateTime.Now.Subtract(timer).TotalSeconds > 30)
            break;
    }
    return liste;
}
Eradash
  • 352
  • 6
  • 14