3

I have a problem in which I have a large list of items sorted by "weights". I need to be able to randomly select items from this list, but the items closer to the start (greater weights) must have a greater chance of being selected based on an "elitism" factor.

I realize that similar questions have been asked before, but the catch here is that this list will be changing over time. New values will be sorted into the list as the last item is deleted (to keep a constant sized pool of "optimized" values).

First off, what would be the most efficient way of doing the selection? Selection must happen in real time from a list anywhere from 50 to 1000 item long.

Second, what would be the best data structure to use here? I'm using C#.

I just thought of a possible solution, but I'd like some feedback on the idea. What if I were to generate a random float value within a certain range, and then do something along the lines of squaring it? Small values would return small values, and large values would return MUCH larger values. From what I can tell, mapping this result to the length of the list should give the desired effect. Does this sound right?

double-beep
  • 3,889
  • 12
  • 24
  • 35
Camander
  • 137
  • 1
  • 10

4 Answers4

2

Unfortunately, I can't offer any code right now, but some ideas:

As your list is sorted from high weighted to low weighted, you should be able to use a random number generator based on a normal distribution. If you don't have such a random number generator at hand, you can transform a uniform distribution into a normal distribution using the code found here: Random Gaussian Variables

I'm terrible at explaining, but I'l try: You can define the bias (the mean value) at 0 and the sigma (the deviation) at, let's say, 3. Then you take the absolute value from the generated number, as you may get negative numbers.

This will give you a number generator, that has a high probably around the bias number (0 in the above example), and a lower probability for numbers that deviate much from there.

As I said, I'm terrible at explaining

Community
  • 1
  • 1
Shelling
  • 409
  • 7
  • 12
1

I would do something like:

string[] names = new[] { "Foo", "Bar", "Fix" };

// The weights will be 3, 2, 1
int[] weights = new int[names.Length];
for (int i = 0; i < names.Length; i++)
{
    weights[i] = names.Length - i;
}

int[] cumulativeWeights = new int[names.Length];

// The cumulativeWeights will be 3, 5, 6
// so if we generate a number, 1-3 Foo, 4-5 Bar, 6 Fiz
cumulativeWeights[0] = weights[0];
int totalWeight = weights[0];

for (int i = 1; i < cumulativeWeights.Length; i++)
{
    cumulativeWeights[i] = cumulativeWeights[i - 1] + weights[i];
    totalWeight += weights[i];
}

var rnd = new Random();

while (true)
{
    int selectedWeight = rnd.Next(totalWeight) + 1; // random returns 0..5, +1 == 1..6
    int ix = Array.BinarySearch(cumulativeWeights, selectedWeight);
    // If value is not found and value is less than one or more 
    // elements in array, a negative number which is the bitwise 
    // complement of the index of the first element that is 
    // larger than value.
    if (ix < 0)
    {
        ix = ~ix;
    }

    Console.WriteLine(names[ix]);
}

I've built an array of weight. I have used a linear method. The first element has weight equal to (number of elements), the second one has weight (number of elements - 1) and so on. You can use your algorithm, but it is easier if the weight is integer.

Then I calculated a cumulativeWeights array and a totalWeight.

Then I can extract a binary number between 1 and totalWeight and find the index that has the cumulativeWeight that is <= the random number. Being cumulativeWeights sorted (clearly :-) ), I can use Array.BinarySearch, that has the advantage that if the exact number isn't found, the index to the next greatest number is given.

Now, with double weights it gets a little more complex for the Random part:

string[] names = new[] { "Foo", "Bar", "Fix" };

// The weights will be 3.375, 2.25, 1.5
double[] weights = new double[names.Length];
for (int i = 0; i < names.Length; i++)
{
    weights[i] = Math.Pow(1.5, names.Length - i);
}

double[] cumulativeWeights = new double[names.Length];

// The cumulativeWeights will be 3.375, 3.375+2.25=5.625, 3.375+2.25+1.5=7.125
// so if we generate a number, 1-3.375 Foo, >3.375-5.625 Bar, >5.625-7.125 Fiz
// totalWeight = 7.125
cumulativeWeights[0] = weights[0];
double totalWeight = weights[0];

for (int i = 1; i < cumulativeWeights.Length; i++)
{
    cumulativeWeights[i] = cumulativeWeights[i - 1] + weights[i];
    totalWeight += weights[i];
}

var rnd = new Random();

while (true)
{
    // random returns (0..1 * totalWeight - 1) + 1 = (0...6.125) + 1 = 1...7.125
    double selectedWeight = (rnd.NextDouble() * (totalWeight - 1)) + 1; 

    int ix = Array.BinarySearch(cumulativeWeights, selectedWeight);
    // If value is not found and value is less than one or more 
    // elements in array, a negative number which is the bitwise 
    // complement of the index of the first element that is 
    // larger than value.
    if (ix < 0)
    {
        ix = ~ix;
    }

    Console.WriteLine(names[ix]);
}

The Random.NextDouble() method returns a number 0<=x<1 that we have to convert to our weight.

Based on that principle, it is possible to build a List<T> class that uses it:

public class ListWithWeight<T>
{
    private readonly List<T> List = new List<T>();

    private readonly List<double> CumulativeWeights = new List<double>();

    private readonly Func<int, double> WeightForNthElement;

    private readonly Random Rnd = new Random();

    public ListWithWeight(Func<int, double> weightForNthElement)
    {
        WeightForNthElement = weightForNthElement;
    }

    public void Add(T element)
    {
        List.Add(element);

        double weight = WeightForNthElement(List.Count);

        if (CumulativeWeights.Count == 0)
        {
            CumulativeWeights.Add(weight);
        }
        else
        {
            CumulativeWeights.Add(CumulativeWeights[CumulativeWeights.Count - 1] + weight);
        }
    }

    public void Insert(int index, T element)
    {
        List.Insert(index, element);

        double weight = WeightForNthElement(List.Count);

        if (CumulativeWeights.Count == 0)
        {
            CumulativeWeights.Add(weight);
        }
        else
        {
            CumulativeWeights.Add(CumulativeWeights[CumulativeWeights.Count - 1] + weight);
        }
    }

    public void RemoveAt(int index)
    {
        List.RemoveAt(index);
        CumulativeWeights.RemoveAt(List.Count);
    }

    public T this[int index]
    {
        get
        {
            return List[index];
        }

        set
        {
            List[index] = value;
        }
    }

    public int Count
    {
        get
        {
            return List.Count;
        }
    }

    public int RandomWeightedIndex()
    {
        if (List.Count < 2)
        {
            return List.Count - 1;
        }

        double totalWeight = CumulativeWeights[CumulativeWeights.Count - 1];
        double selectedWeight = (Rnd.NextDouble() * (totalWeight - 1.0)) + 1;

        int ix = CumulativeWeights.BinarySearch(selectedWeight);
        // If value is not found and value is less than one or more 
        // elements in array, a negative number which is the bitwise 
        // complement of the index of the first element that is 
        // larger than value.
        if (ix < 0)
        {
            ix = ~ix;
        }

        // We want to use "reversed" weight, where first items
        // weight more:

        ix = List.Count - ix - 1;
        return ix;
    }
}

and

var lst = new ListWithWeight<string>(x => Math.Pow(1.5, x));
lst.Add("Foo");
lst.Add("Bar");
lst.Add("Fix");
lst.RemoveAt(0);
lst.Insert(0, "Foo2");

while (true)
{
    Console.WriteLine(lst[lst.RandomWeightedIndex()]);
}
xanatos
  • 102,557
  • 10
  • 176
  • 249
  • Thanks for the response. I thought about doing something similar to this, but as I said, the list will be constantly changing. That means I would always be updating the weights of each successive value in cumulativeWeights. Also, with your implementation, the chance of an item being selected would depend on its weight, rather that its position in the list. A good Idea, but unfortunately not what I need here :( – Camander Apr 23 '15 at 10:31
  • @Camander The weight array *is* based on the position on the list. If you delete an element, you only need to remove the last element of the weight array (at this point a `List cumulativeWeights would be better)` and update the `totalWeights`. – xanatos Apr 23 '15 at 10:34
  • This is like the solution that I would post if this wasn't already here, except that I would use a search tree. Each node indicates the total weight of its children. Your go left or right down the tree depending on whether the left half is greater or less than your random value, and when you go right you subtract the total weight of everything on the left. A tree will be easier to edit than an array. – sh1 Apr 24 '15 at 01:22
  • @sh1 He had asked for the weight to be based only on position, so if you remove an element in the middle, all the weights after it are "recalculated" (so in the end I always remove the last weight, or add the weight as last). If you want chosen weight per item, then a tree is the right way to do it. – xanatos Apr 24 '15 at 04:47
  • I read it the other way, that the position was based on weight which was predetermined... Otherwise I don't see the point in asking how to insert new items into the list efficiently. – sh1 Apr 24 '15 at 05:52
  • @sh1 is correct. I apologize if I was unclear about this. Weights of items are determined elsewhere and then they are sorted into the list. Position is determined by weight, and chance of being selected is determined by position. – Camander Apr 29 '15 at 17:58
1

This is what I would do:

private static int GetPosition(double value, int startPosition, int maxPosition, double weightFactor, double rMin)
{
    while (true)
    {
        if (startPosition == maxPosition) return maxPosition;

        var limit = (1 - rMin)*weightFactor + rMin;
        if (value < limit) return startPosition;
        startPosition = startPosition + 1;
        rMin = limit;
    }
}

static void Main()
{
    const int maxIndex = 100;
    const double weight = 0.1;

    var r = new Random();
    for (var i = 0; i < 200; i++)
        Console.Write(GetPosition(r.NextDouble(), 0, maxIndex, weight, 0) + " ");
}

A 0.1 weight factor means that the first item has a 10% chance to be chosen. All the other items have 90%.

The 2nd item has 10% of the remaining 90% = 9%

The 3rd item has 10% of the remaining 81% = 8.1%

...

As you increase the weight factor, it will be more likely that the first items are chosen over the last ones in the list. At a factor of 1, only the 1st item will be chosen.

For a weight of 0.1 and 10 items, here are the probabilities for each index:

0: 10%
1: 9%
2: 8.1%
3: 7.29%
4: 6.56%
5: 5.9%
6: 5.31%
7: 4.78%
8: 4.3%
9: 3.87%

EDIT

Of course, this would work only for many indexes (at least 10 for 0.1) otherwise it will give greater probabilities for the last index. For example if weight = 0.1 and maxIndex = 1, index 0 will have a probability of 10% but index 1 will have 90%.

Andrei Tătar
  • 6,077
  • 14
  • 32
1

Create a binary tree, sorted by weight (sorting is not required except that it's specified in the question), and for each node record the total weight of all the children. At the top of this we can calculate the total weight of the whole list.

Pick a random value r between zero and the total weight of everything. At each node, if the weight of the current node is greater than r then this is your result. Otherwise subtract the weight of the current node from r. Now, if the total weight of all the left children is less than r then go left. Otherwise subtract the total weight of all the left children from r and go right. Repeat until you have a result.

Insertion and deletion costs are down to how you choose to implement and balance your tree, but you will also have to traverse all the ancestors to update their weights.

If you don't actually need it sorted then making it a heap might improve the fast-out behaviour.

sh1
  • 3,914
  • 13
  • 28