2

The code below tries to check if all the words in searchWords appear in newsPaperWords. Both lists can contain duplicates. If a word appears n times in searchWords, it'll have to appear at least n times in newsPaperWords for the method to return true. I thought that the time complexity was 2*O(n) + O(m) but the interviewer told me that it is 2*O(n log n) + O(m log m).

/**
 * @param searchWords The words we're looking for. Can contain duplicates
 * @param newsPaperWords  The list to look into
 */
public boolean wordMatch(List<String> searchWords, List<String> newsPaperWords) {
    Map<String, Integer> searchWordCount = getWordCountMap(searchWords);
    Map<String, Integer> newspaperWordCount = getWordCountMap(newsPaperWords);
    for (Map.Entry<String, Integer> searchEntry : searchWordCount.entrySet()) {
        Integer occurrencesInNewspaper = newspaperWordCount.get(searchEntry.getKey());
        if (occurrencesInNewspaper == null || occurrencesInNewspaper < searchEntry.getValue()) {
            return false;
        }
    }
    return true;
}

private Map<String, Integer> getWordCountMap(List<String> words) {
    Map<String, Integer> result = new HashMap<>();
    for (String word : words) {
        Integer occurrencesThisWord = result.get(word);
        if (occurrencesThisWord == null) {
            result.put(word, 1);
        } else {
            result.put(word, occurrencesThisWord + 1);
        }
    }
    return result;
}

As I see it, the time complexity of the method is 2*O(n) + O(m) (being n the number of elements in searchWords and m the number of elements in newsPaperWords):

  • The method getWordCountMap() has a complexity of O(n), being n the number of elements in the given list. The method loops the list once, and assuming that the calls to result.get(word) and result.put() are O(1).
  • Then, the iteration over searchWordCount.entrySet() is, worst-case, O(n), assuming, again, that calls to Hashmap.get() are O(1).

So, simply adding, O(n) + O(m) to build the two maps plus O(n) for the last look.

After reading this answer, taking O(n) as worst-case complexity for HashMap.get(), I could understand that the complexity of getWordCountMap() goes up to O(n*2n) and the final loop to O(n*n), which would give a total complexity of O(n*2n) + O(m*2m) + O(n*n).

But how is it 2*O(n log n) + O(m log m)?

antonro
  • 359
  • 1
  • 3
  • 9
  • 1
    "*but the interviewer told me that it is `2*O(n log n) + O(m log m)`*" - If anything, this would collapse to `O(n log n + m log m)`. – Turing85 Apr 25 '18 at 17:13
  • 1
    Seems you did not read that answer enough, as it states that In JDK 8 worst case is log n for HashMap.get() – juvian Apr 25 '18 at 17:19
  • @juvian The documentation for [JDK 8](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#put-K-V-), [JDK 9](https://docs.oracle.com/javase/9/docs/api/java/util/HashMap.html#) and [JDK 10](https://docs.oracle.com/javase/10/docs/api/java/util/HashMap.html#get-java.lang.Object-) state that: "*This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.*". – Turing85 Apr 25 '18 at 17:26
  • @Turing85 and without assuming anything, its log n – juvian Apr 25 '18 at 17:28
  • @juvian huh... that's funny. Just after reading karol-dowbecki's answer was I reminded of this fact. Why is this not in the documentation? – Turing85 Apr 25 '18 at 17:29
  • @juvian: I think it is only reasonable to assume that the hash function will adequately do its job. If we are going to get Log n from HashMap too, then we might as well use a self-balancing m-way tree. – displayName Apr 25 '18 at 17:32
  • 1
    @displayName I believe its reasonable, but seems his interviewer doesn´t – juvian Apr 25 '18 at 17:32
  • @juvian: Agreed. Probably the interviewer is trying to figure out if the candidate knows what happens in an extreme situation. – displayName Apr 25 '18 at 17:36
  • @displayName In this case, I would tend to disagree. This behaviour is not documented in the API, and not everyone reads JEPs regularly. – Turing85 Apr 25 '18 at 17:38
  • @Turing85: I agree with you too. The interviewer, though, is free to ask such a question. An interview is a bi-direction process. If the interviewee finds the interviewer's questions disinteresting, then the candidate can/should reject the company. – displayName Apr 25 '18 at 17:42

2 Answers2

3

Due to JEP 180: Handle Frequent HashMap Collisions with Balanced Trees the worst case for HashMap.get() operation will be O(log n). To quote the JEP 180:

The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).

This would make getWordCountMap() method O(n log n).

Karol Dowbecki
  • 38,744
  • 9
  • 58
  • 89
1

The complexity you have deduced is correct assuming the hashmap is using a proper hash function. This algorithm looks like O(m + n) to me.


I guess that your interviewer described the complexity of another approach to solving this problem which is more time-consuming but ends up taking less space.

displayName
  • 12,673
  • 7
  • 50
  • 70
  • Sorry, I don't quite understand. If the complexity that I have deduced is correct, why do you say is `O(m + n)`, while I got `O(m + 2*n)`? – antonro Apr 25 '18 at 17:58
  • 1
    @antonro: I thought it was obvious. Constants are not important in Big-O calculations. We only try to find the raw rate of growth by means of Big-O, not an exact rate of growth. – displayName Apr 25 '18 at 18:10
  • Ah ok, understood. Thanks. – antonro Apr 25 '18 at 19:37