2

In Think Python the author introduces defaultdict. The following is an excerpt from the book regarding defaultdict:

If you are making a dictionary of lists, you can often write simpler code using defaultdict. In my solution to Exercise 12-2, which you can get from http://thinkpython2.com/code/anagram_sets.py, I make a dictionary that maps from a sorted string of letters to the list of words that can be spelled with those letters. For example, 'opst' maps to the list ['opts', 'post', 'pots', 'spot', 'stop', 'tops']. Here’s the original code:

 def all_anagrams(filename):
     d = {}
     for line in open(filename):
         word = line.strip().lower()
         t = signature(word)
         if t not in d:
             d[t] = [word]
         else:
             d[t].append(word) return d

This can be simplified using setdefault, which you might have used in Exercise 11-2:

 def all_anagrams(filename):
     d = {}
     for line in open(filename):
         word = line.strip().lower()
         t = signature(word)
         d.setdefault(t, []).append(word) 
     return d

This solution has the drawback that it makes a new list every time, regardless of whether it is needed. For lists, that’s no big deal, but if the factory function is complicated, it might be. We can avoid this problem and simplify the code using a defaultdict:

def all_anagrams(filename):
    d = defaultdict(list)
    for line in open(filename):
        word = line.strip().lower()
        t = signature(word)
        d[t].append(word)
    return d

Here's the definition of signature function:

def signature(s):
    """Returns the signature of this string.

    Signature is a string that contains all of the letters in order.

    s: string
    """
    # TODO: rewrite using sorted()
    t = list(s)
    t.sort()
    t = ''.join(t)
    return t

What I understand regarding the second solution is that setdefault checks whether t (the signature of the word) exists as a key, if not, it sets it as a key and sets an empty list as its value, then append appends the word to it. If t exists, setdefault returns its value (a list with at least one item, which is a string representing a word), and append appends the word to this list.

What I understand regarding the third solution is that d, which represents a defaultdict, makes t a key and sets an empty list as its value (if t doesn't already exist as a key), then the word is appended to the list. If t does already exist, its value (the list) is returned, and to which the word is appended.

What is the difference between the second and third solutions? I What it means that the code in the second solution makes a new list every time, regardless of whether it's needed? How is setdefault responsible for that? How does using defaultdict make us avoid this problem? How are the second and third solutions different?

sunny1304
  • 1,632
  • 3
  • 17
  • 28

1 Answers1

3

The "makes a new list every time" means everytime setdefault(t, []) is called, a new empty list (the [] argument) is created to be the default value just in case it's needed. Using a defaultdict avoids the need for doing that.

Although both solutions return a dictionary, the one using defaultdict is actually returning a defaultdict(list) which is a subclass of the built-in dict class. This normally is not a problem. The most notable effect will likely be if you print() the returned object, as the output from the two looks quite different.

If you don't want that for whatever reason, you can change the last statement of the function to:

    return dict(d)

to convert the defaultdict(list) created into a regular dict.

martineau
  • 99,260
  • 22
  • 139
  • 249
  • Thank you! So, just to make sure that I get it right, when `setdefault(t, [])` is called, `[]` is evaluated first, then `setdefault` checks whether `t` exists or not? – Mahmood Muhammad Nageeb Aug 20 '17 at 02:03
  • 2
    @MahmudMuhammadNaguib yes, that is *required by python's semantics*. A list literal tells the interpreter "create this list" – juanpa.arrivillaga Aug 20 '17 at 02:04
  • @juanpa.arrivillaga Thank you! – Mahmood Muhammad Nageeb Aug 20 '17 at 02:14
  • Seems like it would have been nice if .setdefault() had supported lazy evaluation for cases where the second argument, the default "value" is a callable or type rather than an instance or other value. But that's probably not a realistic change to make now. :( – Jim Dennis Aug 20 '17 at 04:29
  • 1
    https://bugs.python.org/issue10930 ... discussions this lack of shortcut evaluation as "Not a bug" (a conclusion with which I agree). Nothing about lazy eval that I could find, though. – Jim Dennis Aug 20 '17 at 04:37
  • A fairly fast alternative to both `setdefault()` and `defaultdict` would be to define your own `dict` subclass with a `__missing__()` method to provide the default value. This is detailed in @Aaron Hall's [answer](https://stackoverflow.com/a/19829714/355230) to the question [What is the best way to implement nested dictionaries?](https://stackoverflow.com/questions/635483/what-is-the-best-way-to-implement-nested-dictionaries) – martineau Nov 20 '20 at 12:09