LinkedHashSet - insertion order and duplicates - keep newest "on top"

Question

I need a collection that keeps insertion order and has unique values. LinkedHashSet looks like the way to go, but there's one problem - when two items are equal, it removes the newest one (which makes sense), here's an example:

set.add("one");
set.add("two");
set.add("three");
set.add("two");

The LinkedHashSet will print:

one, two, three

But what I need is:

one, three, two

What would be the best solution here? Is there any collection/collections method that can do this or should I implement it manually?

What about an `ArrayList` and use of `contains` method before inserting ? — ortis, Apr 04 '16 at 10:16
I suspect you're going to have to implement your own algorithm for this, likely backed by a `HashSet`. — Mena, Apr 04 '16 at 10:17
@AvihooMamka `TreeSet` keeps elements sorted based on natural or given order, it doesn't per se retain insertion order. — Mena, Apr 04 '16 at 10:18
Note that `LinkedHashSet` is not `final`, but you won't have access by the backing map used to store/retrieve elements. — Mena, Apr 04 '16 at 10:19
You can use map . Each time the user enters a letter, you would update this map. — dneranjan, Apr 04 '16 at 10:20
As others said, you need your own implementation. Should rely on LinkedHashSet, just remove element on add to get the behavior you want — duduamar, Apr 04 '16 at 10:21
Misunderstood the question and thought it printed `one, three, two` and you wanted `one, two, three`. Fail. — bcsb1001, Apr 04 '16 at 10:23
Thanks, guys. Just as I thought there's no collection that provides this behavior. — rafakob, Apr 04 '16 at 10:26
HashMap replace the old value.but in the case of HashSet the item is not inserted. — dneranjan, Apr 04 '16 at 10:28

OldCurmudgeon · Accepted Answer · 2016-07-07T07:54:01.603

35

Most of the Java Collections can be extended for tweaking.

Subclass LinkedHashSet, overriding the add method.

class TweakedHashSet<T> extends LinkedHashSet<T> {

    @Override
    public boolean add(T e) {
        // Get rid of old one.
        boolean wasThere = remove(e);
        // Add it.
        super.add(e);
        // Contract is "true if this set did not already contain the specified element"
        return !wasThere;
    }

}

edited Jul 07 '16 at 07:54

answered Apr 04 '16 at 10:35

OldCurmudgeon

60,862
15
108
197

It's working as I want. The same could be applied to ArrayList. In that case, is there any reason why I should use LinkedHashSet? – rafakob Apr 04 '16 at 10:57
1

@rafakob - `ArrayList` allows duplicates. – OldCurmudgeon Apr 04 '16 at 10:59
2

@OldCurmudgeon: not if you remove before inserting – ytoledano Apr 04 '16 at 11:13
6

@rafakob the remove should be more efficient in the `LinkedHashSet`, but you'll have to profile to be sure (depends on the quality of the typical `hashCode` implementation of your `T`s and the typical size of your Collection among other things) – Hulk Apr 04 '16 at 11:45
1

@hulk aside from the `remove` and `contains` functionality, `ArrayList` will undoubtedly outperform `LinkedHashSet` everywhere else. Having the entire structure in a contiguous memory block helps with memory access and JIT optimization. That said, you're 100% correct that profiling is the way to go - my suspicion is that the performance boost will be 3 or 4x, which in all honesty isn't enough for me to care unless it's in a very critical code block. Most times, I care a lot more about correctness than a 3-4x lookup speed in a data structure. – corsiKa Apr 04 '16 at 16:17
2

@corsiKa: the difference is not a factor like 3 or 4 but the *scaling*, so given a large enough collection, there will be a performance difference that hurts. But of course, we don’t know anything about the OP’s maximum collection size and how often a lookup/modification will happen, compared to iterations… – Holger Apr 04 '16 at 16:24
@Holger I was only ever considering a large collection (because in a small collection, who cares?) In core collection behaviors, But `LinkedHashSet` will be faster in `remove` and `contains` and slower in `add` and `iterator`, regardless of scale. It is true that as scales get bigger, the badness of remove in `ArrayList` is worse than the badness of iteration in `LinkedHashSet`. Like I agreed to with hulk, profiling is key. – corsiKa Apr 04 '16 at 16:33
2

@corsiKa: in principle, we agree on that. For a particular application, profiling is the key *and* an analysis regarding the question, whether a significantly higher number of elements can appear in a different scenario/ in production environment/ at a different customer/ etc. If you know there’s an intrinsic limit in your task, test and profile with your expected maximum… – Holger Apr 04 '16 at 16:37
1

You can simplify the method to `return !remove(e) && super.add(e);` as `super.add(e);` will always return `true`… – Holger May 10 '16 at 11:11
@Holger - `&&` will not add if already there, will just remove the object. instead `&` should be used. – niksvp Jul 07 '16 at 07:10
@niksvp - `&` is bitwise - we want logical and here. – OldCurmudgeon Jul 07 '16 at 07:13
@OldCurmudgeon - `logical` will not add (evaluate the second statement) if the element is already present in the set. it will just remove the element. test the sample data in the question. you will remain with `one, three` in the output. while it is expected to add back the element after removing if already present. bitwise operator will suffice that. – niksvp Jul 07 '16 at 07:23
@niksvp - Good call on the logic - IMHO using subtle language hacks like bitwise logic is not the solution - does that work now? – OldCurmudgeon Jul 07 '16 at 07:39
@OldCurmudgeon - Using logical OR will again end in another problem. it will never add any values if not present. So, the collection remains empty all the time. To simplify `bitwise` is the only solution, but as you claimed above, revision 4 is the alternate correct solution. – niksvp Jul 07 '16 at 07:50
1

@niksvp: correct, it should be `return !remove(e) & super.add(e);` here, small typo, big impact. – Holger Jul 07 '16 at 08:59
1

@OldCurmudgeon: the fact that the operator `&` can be used as bitwise operator for `int` values doesn’t make that it’s primary role. For `boolean` arguments, it’s a logical operator, as officially stated, see [JLS §15.22.2](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.22.2). In fact, the operator `&&` is described as “*The conditional-and operator `&&` is like `&` (§15.22.2), but evaluates its right-hand operand only if the value of its left-hand operand is `true`*”. – Holger Jul 07 '16 at 09:03
@Holger - Acknowledged - I hereby humbly invoke the clarity/brevity argument in this case. – OldCurmudgeon Jul 07 '16 at 09:05

score 20 · Answer 2 · edited Nov 20 '16 at 06:08

You can simply use a special feature of LinkedHashMap:

Set<String> set = Collections.newSetFromMap(new LinkedHashMap<>(16, 0.75f, true));
set.add("one");
set.add("two");
set.add("three");
set.add("two");
System.out.println(set); // prints [one, three, two]

In Oracle’s JRE the LinkedHashSet is backed by a LinkedHashMap anyway, so there’s not much functional difference, but the special constructor used here configures the LinkedHashMap to change the order on every access not only on insertion. This might sound as being too much, but in fact affects the insertion of already contained keys (values in the sense of the Set) only. The other affected Map operations (namely get) are not used by the returned Set.

If you’re not using Java 8, you have to help the compiler a bit due to the limited type inference:

Set<String> set
    = Collections.newSetFromMap(new LinkedHashMap<String, Boolean>(16, 0.75f, true));

but the functionality is the same.

+1 I like this approach. But I'd recommend putting the creation of this Set into a well-named method (`createAccessOrderSet`?) and adding a few unit tests that make very clear what this does. — Hulk, Apr 04 '16 at 16:47
@Hulk: sure, that’s the right thing for production code. Then, you likely will provide overloads for providing the capacity and load factor parameters. I used the same values as the default constructor here, but without a documented factory method, they’ll look like magic literals to the reader… — Holger, Apr 04 '16 at 16:53

score 6 · Answer 3 · answered Apr 04 '16 at 10:32

When initializing you're LinkedHashSet you could override the add method.

Set<String> set = new LinkedHashSet<String>(){
    @Override
    public boolean add(String s) {
        if(contains(s))
            remove(s);
        return super.add(s);
    }
};

Now it gives you:

set.add("1");
set.add("2");
set.add("3");
set.add("1");
set.addAll(Collections.singleton("2"));

// [3, 1 ,2]

even the addAll method is working.

score 1 · Answer 4 · answered Nov 20 '16 at 06:33

1

All solution provided above are excellent but if we don't want to override already implemented collections. We can solve this problem simply by using an ArrayList with a little trick

We can create a method which you will use to insert data into your list

public static <T> void addToList(List<T> list, T element) {
    list.remove(element); // Will remove element from list, if list contains it
    list.add(element); // Will add element again to the list 
}

And we can call this method to add element to our list

List<String> list = new ArrayList<>();

addToList(list, "one");
addToList(list, "two");
addToList(list, "three");
addToList(list, "two");

Only disadvantage here is we need to call our custom addToList() method everytime instead of list.add()

answered Nov 20 '16 at 06:33

Naresh Joshi

3,354
26
34

Nice and simple. But I'm not sure why you're using list instead of a `LinkedHashSet` as per the question. – shmosel Jan 12 '17 at 02:42
Thanks @shmosel, I have considered ArrayList for my code sample because LinkedHashSet was not maintaining the order which user wants plus ArrayList is also the simplest and fastest way solve this problem. – Naresh Joshi Jan 13 '17 at 05:43
`LinkedHashSet` maintains insertion order, just like an `ArrayList`. And it's more efficient at removals. – shmosel Jan 13 '17 at 05:45
The `list.remove(element)` runs in O(N) time. As your list gets large performance will quickly degrade for your `addToList` method. HashSet on the other hand runs `contains(element)` in O(1) regardless of the size. – Danny C Mar 16 '21 at 07:53

LinkedHashSet - insertion order and duplicates - keep newest "on top"

4 Answers4

Linked