0

I have this HashMap: HashMap<NGram, Integer> countMap = new HashMap<>();
I'm trying to fill it (snippet is in a loop):

NGram ng = new NGram(param);
if (countMap.containsKey(ng)){  // or if(countMap.get(ng) != null){
    int counter = countMap.get(ng);
    countMap.put(ng, ++counter);
}
else{
    countMap.put(ng, 1);
}

But countMap.containsKey(ng) never returns true (when it should), so the else-statement will always be executed! I overwrote equals() and hashCode() of NGram and they're working just fine. With primitive datatypes the snippet will work.

So, my question is: How do I convince Java, that there's already the object in the HashMap?

Here's the code of toString(), hashCode() and equals():

public String toString(){
    String ret = "";
    for (String s : previousToken){
        ret += s + " ";
    }
    ret += token;
    return ret;
}

public boolean equals(NGram other){
    if(this.toString().equals(other.toString())){
        return true;
    }
    else{
        return false;
    }
}

public int hashCode(){
    String ts = this.toString();
    int result = 3;
    for(int i = 0; i < ts.length(); i++){
        result = 2 * result + ((int)ts.charAt(i));
    }
    return result;
}
ThatsMe
  • 7
  • 5
  • 1
    Can you show us the code from the `equals` and the `getHashCode` methods? – Kevin Wallis Jun 04 '16 at 07:31
  • Maybe you have never added any `NGram` object so you will never go into the code within the `if` ? – Kevin Wallis Jun 04 '16 at 07:34
  • @KevinWallis OP did say *"snippet is in a loop"*, so it's more likely that `param` is used in `equals()` and is always unique, e.g. it is the loop variable. – Andreas Jun 04 '16 at 07:36
  • @Andreas: `param` is generated in the loop, it's a string-array and from time to time the content is the same as before. – ThatsMe Jun 04 '16 at 07:41
  • This would probably be simpler with a `MultiSet` from something like Guava. – chrylis -cautiouslyoptimistic- Jun 04 '16 at 07:42
  • @chrylis: can't use other APIs – ThatsMe Jun 04 '16 at 07:45
  • @MasterBolle as hint this is not the common way to implement `equals` :D – Kevin Wallis Jun 04 '16 at 07:48
  • @KevinWallis: I'know :P but it should work, shouldn't it? – ThatsMe Jun 04 '16 at 07:51
  • _When using a hash-based Collection or Map such as HashSet, LinkedHashSet, HashMap, Hashtable, or WeakHashMap, make sure that the hashCode() of the key objects that you put into the collection never changes while the object is in the collection. The bulletproof way to ensure this is to make your keys immutable, which has also other benefits._ [link](http://stackoverflow.com/questions/27581/what-issues-should-be-considered-when-overriding-equals-and-hashcode-in-java) – Kevin Wallis Jun 04 '16 at 07:52
  • I don't know if your `toString` method does change? – Kevin Wallis Jun 04 '16 at 07:55
  • 1
    @KevinWallis the hashCode() doesn't change it's return value while an object is in the Map, toString() doesn't change either – ThatsMe Jun 04 '16 at 07:55
  • Have you already debugged? Because I implemented an example and it works without any problems – Kevin Wallis Jun 04 '16 at 07:59
  • oh okay, that's the last bug I have... – ThatsMe Jun 04 '16 at 08:02
  • It was false, I already removed it. How much objects are you adding into the `HashMap`? And can you make a breakpoint the `else` branch and check if it is entered – Kevin Wallis Jun 04 '16 at 08:09
  • About 18 million...`else` is entered – ThatsMe Jun 04 '16 at 08:13
  • Have you also checked if the elements you expect are contained in the `HashMap`? Because with this count of elements it is really important to have a correct `hashCode` and `equals` – Kevin Wallis Jun 04 '16 at 08:57
  • Yes, there are many keys with the same `hashCode`, as expected, and the `equals()` is using the ´hashCode()` now :) – ThatsMe Jun 04 '16 at 09:13
  • `equals()` *cannot* use `hashcode()`, since `hashcode()` doesn't generate unique values for unique objects. – Andreas Jun 04 '16 at 13:33

1 Answers1

0

Here is an example of what your 3 methods should be.
This is what is called an MCVE (Minimal, Complete, and Verifiable example).

The primary concept is that constructing a new combined string by calling toString() is not the right way to implement equals() and hashCode().

This is mostly for performance reasons. With 18 million objects added to the map, performance of equals() and hashCode() matters.

Another reason it's a bad idea, is that toString() may not show all values of the object, or that two different objects may still have the same toString() result, e.g. if tokens could contain spaces, then concatenating them would not be able to differentiate A B, C from A, B C.

For the example, a String[] is used for the previousToken field, and a varargs constructor is used to simplify testing. Null tokens not allowed. Yours is likely slightly different, but this shows the concepts.

public class NGram {
    private String   token;
    private String[] previousToken; // could also be List<String>
    public NGram(String ... allTokens) {
        if (allTokens.length == 0)
            throw new IllegalArgumentException("At least one token is required");
        for (String s : allTokens)
            if (s == null)
                throw new NullPointerException();
        this.token = allTokens[allTokens.length - 1];
        this.previousToken = Arrays.copyOfRange(allTokens, 0, allTokens.length - 1);
    }
    @Override
    public String toString() {
        StringBuilder buf = new StringBuilder();
        for (String s : this.previousToken)
            buf.append(s).append(' ');
        return buf.append(this.token).toString();
    }
    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null || getClass() != obj.getClass())
            return false;
        NGram that = (NGram) obj;
        return this.token.equals(that.token) &&
               Arrays.equals(this.previousToken, that.previousToken);
    }
    @Override
    public int hashCode() {
        return this.token.hashCode() + 31 * Arrays.hashCode(this.previousToken);
    }
}

TEST

This shows a faster, more compact version of the code to add a new object to the map. It is faster because it only performs one lookup per add, and made more compact by using a ternary operator instead of an if statement, combining the two put() calls.

It's isolated in a helper method for easier testing.

public static void main(String[] args) {
    Map<NGram, Integer> countMap = new HashMap<>();
    add(countMap, new NGram("Foo"));
    add(countMap, new NGram("Bar"));
    add(countMap, new NGram("Foo", "Bar"));
    add(countMap, new NGram("This", "is", "a", "test"));
    add(countMap, new NGram("Bar"));
    System.out.println(countMap);
}
private static void add(Map<NGram, Integer> countMap, NGram ng) {
    Integer counter = countMap.get(ng);
    countMap.put(ng, (counter != null ? counter + 1 : 1));
}

OUTPUT

{Bar=2, Foo=1, This is a test=1, Foo Bar=1}

As you can see, the duplicate Bar value is counted twice.


If you end up with lots of duplicate objects, i.e. with high counter values, then using an immutable Integer object as the counter is not great for performance. You can use an int[1], but a mutable helper class is better, since it can provide a good toString() implementation.

private static void add(Map<NGram, Counter> countMap, NGram ng) {
    Counter counter = countMap.get(ng);
    if (counter == null)
        countMap.put(ng, new Counter(1));
    else
        counter.increment();
}
private static final class Counter {
    private int count;
    public Counter(int count) {
        this.count = count;
    }
    public void increment() {
        this.count++;
    }
    public int get() {
        return this.count;
    }
    @Override
    public String toString() {
        return Integer.toString(this.count);
    }
}
Community
  • 1
  • 1
Andreas
  • 138,167
  • 8
  • 112
  • 195