8

I need a way to calculate the number of elements less than X in a TreeSet of Integers really fast.

I can use the

  • subSet()
  • headSet()
  • tailSet()

methods but they are really slow (I just need the count, not the numbers themselves). Is there a way?

Thank you.


EDIT:

I found a workaround that makes things a lot faster! I am using BitSet and it's cardinality() method. I create a BitSet at first and for every element added to the TreeSet I set the corresponding index in BitSet. Now, to count the number of elements less than X I use:

bitset.get(0, X+1).cardinality()

This is much faster compared with treeset.subSet(0, true, X, true).size().

Anyone knows why? I assume BitSet.cardinality() doesn't use linear search.

mnmp
  • 300
  • 2
  • 12
  • You might try a Guava `TreeMultiset`, which supports `headMultiset(element).size()` in O(log n), not O(n). It's not quite the same as a `TreeSet`, though. But `headMultiset(element).elementSet().size()` would also be O(log n). – Louis Wasserman Dec 23 '15 at 06:08
  • Why do you need a treeset? Do you update the datastructure so often? If you don't update the data structure, just keep the number of elements less than X in a hashmap! If you update it not frequently, keep a sorted linked list of numbers. At insert/remove, add/remove from list in O(1) and update the hashmap (O(n)). – Masood_mj Dec 23 '15 at 17:18
  • Thanks for your comment @Masood_mj. The problem is that X is not a specific value, it changes every time I call the cardinality() function. So if I wanna use a hashmap then I have to update all items with key > Y every time I add or delete Y into the hashmap (+1 or -1 all). Am I missing something? – mnmp Dec 23 '15 at 19:43
  • Hashmap+linked list was O(1) get and O(N) update. You can have O(Log(n)) get and O(Log(n)) update, by using a (sorted) binary tree. In each element of the tree, also keep the number of its descendants. Now to get # items < than y, you find it in the binary tree, but also sum the number of elements whenever you go right instead of left. At update you need to update the ancestors of the new element too. By the way, if you are willing to accept approximate answers, there could be faster ways too. – Masood_mj Dec 23 '15 at 22:57
  • Great idea @Masood_mj, is there a way to have such binary tree in Java or I have to implement it myself? – mnmp Dec 23 '15 at 23:07
  • I don't think Java has a tree that let you know the path of a node in the tree when you add it. Implementing a binary tree should not be hard (search online for a sample code) – Masood_mj Dec 23 '15 at 23:09
  • Thanks man. You possibly have information on how BitSet works? I am just curious. Maybe it's doing the same thing you telling me. – mnmp Dec 23 '15 at 23:10

4 Answers4

4

Since all answers so far point to data structures different than Java's TreeSet, I would suggest the Fenwick tree, which has O(log(N)) for updates and queries; see the link for Java implementation.

P Marecki
  • 988
  • 10
  • 19
3

How fast does 'really fast' need to be? Roughly how many elements do you have?

subSet()/headSet()/tailSet() are O(1) because they return a view of the original treeset, but if you size() your subSet() you are still iterating over all the original elements, hence O(N).

Are you using Java 8? This will be about the same but you can parallelise the cost.

Set<Integer> set = new TreeSet<>();
// .. add things to set

long count = set.parallelstream().filter(e -> e < x).count();

NB EDIT

With further exploration and testing I cannot substantiate the claim "if you size() your subSet() you are still iterating over all the original elements". I was wrong. parallelstream().count() on this 4 core machine was ~30% slower than subSet().size()

KarlM
  • 1,552
  • 15
  • 28
  • Thanks! I have like a hundred thousand of elements! I didn't know about the count(), I thought using subSet is the problem. – mnmp Dec 23 '15 at 02:58
  • What's your support for the claim that the subview `count()` method, or rather I assume you mean the `size() method, iterates the entire original collection? – user207421 Dec 23 '15 at 03:58
  • Thanks for asking. I had seen answers like http://stackoverflow.com/questions/15703120/unexpected-complexity-of-common-methods-size-in-java-collections-framework and http://stackoverflow.com/questions/14290751/time-complexity-of-treemap-operations-submap-headmap-tailmap But when I investigated I couldn't substantiate those claims based on source code - versions may have changed etc. In fact writing my own tests size() doesn't seem to vary much even when N varies by x100. I will look further - may withdraw answer - @mnmp have you found any improvements? – KarlM Dec 23 '15 at 05:18
  • Another O(N) for size() here http://stackoverflow.com/questions/14750374/what-is-complexity-of-size-for-treeset-portion-view-in-java – KarlM Dec 23 '15 at 05:19
1

If you don't update the data structure, just keep the number of elements less than X in a hashmap!

If you update it not frequently, keep a sorted linked list of numbers. At insert/remove, add/remove from list in O(1) and update the hashmap (O(n)).

You can have O(Log(n)) get and O(Log(n)) update, by using a (sorted) binary tree. In each element of the tree, also keep the number of its descendants. Now to get # items < than y, you find it in the binary tree, but also sum the number of elements whenever you go right instead of left. At update you need to update the ancestors of the new element too.

By the way, if you are willing to accept approximate answers, there could be faster ways too.

Masood_mj
  • 1,036
  • 8
  • 21
-1
package ArrayListTrial;

import java.util.Scanner;

public class countArray {

    public static void main(String[] args) {
        // TODO Auto-generated method stub

        int[] array = new int[100];
        Scanner scan = new Scanner(System.in);
        System.out.println("input the number you want to compare:");
        int in = scan.nextInt();
        int count = 0;
        System.out.println("The following is array elements:");
        for(int k=0 ; k<array.length ; k++)
        {
            array[k] = k+1;
            System.out.print(array[k] + " ");
            if(array[k] > in)
            {
                count++;
            }
        }
        System.out.printf("\nThere are %d numbers in the array bigger than %d.\n" , count , in);

    }

}
Vivek
  • 9,008
  • 16
  • 71
  • 102
  • Maybe this is the answer to a different question? – KarlM Dec 23 '15 at 02:56
  • It's not the answer to any question. The array being searched is full of zeros. The count for any particular value is therefore known in advance: no search required. @KarlM – user207421 Dec 23 '15 at 02:59