Why is this method's time complexity 2*O(n log n) + O(m log m)?

Question

The code below tries to check if all the words in searchWords appear in newsPaperWords. Both lists can contain duplicates. If a word appears n times in searchWords, it'll have to appear at least n times in newsPaperWords for the method to return true. I thought that the time complexity was 2*O(n) + O(m) but the interviewer told me that it is 2*O(n log n) + O(m log m).

/**
 * @param searchWords The words we're looking for. Can contain duplicates
 * @param newsPaperWords  The list to look into
 */
public boolean wordMatch(List<String> searchWords, List<String> newsPaperWords) {
    Map<String, Integer> searchWordCount = getWordCountMap(searchWords);
    Map<String, Integer> newspaperWordCount = getWordCountMap(newsPaperWords);
    for (Map.Entry<String, Integer> searchEntry : searchWordCount.entrySet()) {
        Integer occurrencesInNewspaper = newspaperWordCount.get(searchEntry.getKey());
        if (occurrencesInNewspaper == null || occurrencesInNewspaper < searchEntry.getValue()) {
            return false;
        }
    }
    return true;
}

private Map<String, Integer> getWordCountMap(List<String> words) {
    Map<String, Integer> result = new HashMap<>();
    for (String word : words) {
        Integer occurrencesThisWord = result.get(word);
        if (occurrencesThisWord == null) {
            result.put(word, 1);
        } else {
            result.put(word, occurrencesThisWord + 1);
        }
    }
    return result;
}

As I see it, the time complexity of the method is 2*O(n) + O(m) (being n the number of elements in searchWords and m the number of elements in newsPaperWords):

The method getWordCountMap() has a complexity of O(n), being n the number of elements in the given list. The method loops the list once, and assuming that the calls to result.get(word) and result.put() are O(1).
Then, the iteration over searchWordCount.entrySet() is, worst-case, O(n), assuming, again, that calls to Hashmap.get() are O(1).

So, simply adding, O(n) + O(m) to build the two maps plus O(n) for the last look.

After reading this answer, taking O(n) as worst-case complexity for HashMap.get(), I could understand that the complexity of getWordCountMap() goes up to O(n*2n) and the final loop to O(n*n), which would give a total complexity of O(n*2n) + O(m*2m) + O(n*n).

But how is it 2*O(n log n) + O(m log m)?

"*but the interviewer told me that it is `2*O(n log n) + O(m log m)`*" - If anything, this would collapse to `O(n log n + m log m)`. — Turing85, Apr 25 '18 at 17:13
Seems you did not read that answer enough, as it states that In JDK 8 worst case is log n for HashMap.get() — juvian, Apr 25 '18 at 17:19
@juvian The documentation for [JDK 8](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#put-K-V-), [JDK 9](https://docs.oracle.com/javase/9/docs/api/java/util/HashMap.html#) and [JDK 10](https://docs.oracle.com/javase/10/docs/api/java/util/HashMap.html#get-java.lang.Object-) state that: "*This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.*". — Turing85, Apr 25 '18 at 17:26
@juvian huh... that's funny. Just after reading karol-dowbecki's answer was I reminded of this fact. Why is this not in the documentation? — Turing85, Apr 25 '18 at 17:29
@juvian: I think it is only reasonable to assume that the hash function will adequately do its job. If we are going to get Log n from HashMap too, then we might as well use a self-balancing m-way tree. — displayName, Apr 25 '18 at 17:32
@displayName I believe its reasonable, but seems his interviewer doesn´t — juvian, Apr 25 '18 at 17:32
@juvian: Agreed. Probably the interviewer is trying to figure out if the candidate knows what happens in an extreme situation. — displayName, Apr 25 '18 at 17:36
@displayName In this case, I would tend to disagree. This behaviour is not documented in the API, and not everyone reads JEPs regularly. — Turing85, Apr 25 '18 at 17:38
@Turing85: I agree with you too. The interviewer, though, is free to ask such a question. An interview is a bi-direction process. If the interviewee finds the interviewer's questions disinteresting, then the candidate can/should reject the company. — displayName, Apr 25 '18 at 17:42

score 3 · Accepted Answer · answered Apr 25 '18 at 17:25

Due to JEP 180: Handle Frequent HashMap Collisions with Balanced Trees the worst case for HashMap.get() operation will be O(log n). To quote the JEP 180:

The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).

This would make getWordCountMap() method O(n log n).

score 1 · Answer 2 · answered Apr 25 '18 at 17:27

1

The complexity you have deduced is correct assuming the hashmap is using a proper hash function. This algorithm looks like O(m + n) to me.

I guess that your interviewer described the complexity of another approach to solving this problem which is more time-consuming but ends up taking less space.

answered Apr 25 '18 at 17:27

displayName

12,673
7
50
70

Sorry, I don't quite understand. If the complexity that I have deduced is correct, why do you say is `O(m + n)`, while I got `O(m + 2*n)`? – antonro Apr 25 '18 at 17:58
1

@antonro: I thought it was obvious. Constants are not important in Big-O calculations. We only try to find the raw rate of growth by means of Big-O, not an exact rate of growth. – displayName Apr 25 '18 at 18:10
Ah ok, understood. Thanks. – antonro Apr 25 '18 at 19:37

Why is this method's time complexity 2*O(n log n) + O(m log m)?

2 Answers2