7

I was looking for a solution to finding the symmetric difference between two dictionaries in Python.

For example, if I have two dictionaries A and B, and I want to create a third dictionary C which contains all the items from A and B that are not found in the other, or in other words, unique.

I could not find a canonical answer, so I decided to open this question and give it my own answer. If you think you have a better method, I would love to see it.


Some data:

a = {'a': 1, 'b':2}
b = {'b': 2, 'c':3}

Desired output:

{'a': 1, 'c': 3}
Inbar Rose
  • 35,719
  • 22
  • 80
  • 120
  • what should happen if the keys match, but not the values? i.e. `a = {'a': 1, 'b':21}` , `b = {'b': 22, 'c':3}`? And... are your values expected to be hashable? – JL Peyret Mar 07 '17 at 16:30
  • 1
    @JLPeyret not much because matching keys do _not_ end up in the symm difference. ----- The values of a dictionary do not need to be hashable. – Elmex80s Mar 07 '17 at 20:06

7 Answers7

8

To get the symmetric difference between two dictionaries use the following robust function:

def dict_symmetric_difference(a, b):
    return {k: a[k] if k in a else b[k] for k in  # break here to fit without scrolling
            set(a.keys()).symmetric_difference(b.keys())}

Just the logic:

{k: a[k] if k in a else b[k] for k in set(a.keys()).symmetric_difference(b.keys())}

Here is a simpler version of the function for explanation:

def dict_symmetric_difference(a, b):
    # first make sets of the dictionary keys
    keys_in_a = set(a.keys())
    keys_in_b = set(b.keys())
    unique_keys = keys_in_a.symmetric_difference(keys_in_b)  # get the unique keys
    c = {}  # start an empty dictionary
    for key in unique_keys:  # iterate over the keys
        if key in a: # if the key is from a dictionary, take the value from there.
            c[key] = a[key]
        else:  # the key is in b dictionary, take the value from there.
            c[key] = b[key]
    return c

Explanation of the a[k] if k in a else b[k] expression:

It is a ternary operator which allows me to use it like so: a if condition else b

With this trick, I get the value for the key, no matter which dictionary it is in.


Using either Function:

>>> dict_symmetric_difference({'a': 1, 'b':2}, {'b':2, 'c':3})
{'a': 1, 'c': 3}
Community
  • 1
  • 1
Inbar Rose
  • 35,719
  • 22
  • 80
  • 120
  • 1
    How about using a ternary operator then? `a[k] if k in a else b[k]` I think that should work fine. – Inbar Rose Mar 07 '17 at 14:00
  • I would suggest to use `^` for symm difference of 2 sets: `set(a.keys()) ^ set(b.keys())` – Elmex80s Mar 07 '17 at 14:42
  • @Elmex80s What is the difference between using the function call, or using the operator? – Inbar Rose Mar 08 '17 at 07:56
  • 1
    @InbarRose I mention one advantage of using the method call instead of the operator form in my answer: you can pass any iterable as the arg of the method, you don't need to explicitly convert it to a set. – PM 2Ring Mar 08 '17 at 08:39
  • OTOH, using an operator is more efficient than doing a method lookup & call. FWIW, here's the CPython [setobject.c](https://github.com/python/cpython/blob/master/Objects/setobject.c) source. I was hoping to learn if it always converts non-set method args to sets, or if it can iterate directly over the items in some cases, but I got lost. :) – PM 2Ring Mar 08 '17 at 09:00
  • 1
    I've added some timeit code & output as a Community Wiki answer. – PM 2Ring Mar 08 '17 at 14:33
  • FWIW, according to Martijn Pieters, those set method calls _don't_ convert their arg to a set, they just iterate directly over the items. – PM 2Ring Mar 11 '17 at 11:06
5

Here is some code that does timeit speed tests on the various algorithms.

The tests use pairs of dicts of equal sizes. The keys are short random letter strings, with varying proportions of shared keys between the dicts. The dicts are constructed from shuffled lists, so even if they contain lots of shared keys the underlying hash table structure of the two dicts should be rather different.

The exact amount of shared keys is random, the proportion of shared keys is controlled by the shared arg of make_dicts.

The main body of this code will run on Python 2.6+ and Python 3. I have Python 2.6.6 and Python 3.6.0 installed on this machine (which is a single core 32 bit machine with 2GB of RAM running on an old Debian derivative of Linux). Some of the dictionary symmetric difference functions use dictionary comprehensions, which aren't available in Python 2.6, so I couldn't test those functions on Python 2. Also, elmex_dsd_py2 won't run on Python 3, so I've commented it out. I was originally going to post Python 2.6 results too, but I had to reduce the output to fit within the message size limits.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

''' Dictionary symmetric difference

    Speed tests of various implementations

    See http://stackoverflow.com/q/42650081/4014959

    Speed test code by PM 2Ring 2017.03.08
'''

from __future__ import print_function

from itertools import product
from random import random, seed, shuffle
from string import ascii_letters
from timeit import Timer

seed(163)

# The dict symmetric difference functions ------------------------------

def inbar_dsd_long(a, b):
    # first make sets of the dictionary keys
    keys_in_a = set(a.keys())
    keys_in_b = set(b.keys())
    # get the unique keys
    unique_keys = keys_in_a.symmetric_difference(keys_in_b)
    # start an empty dictionary
    c = {}
    # iterate over the keys
    for key in unique_keys:
        if key in a: 
            # if the key is from a dictionary, take the value from there.
           c[key] = a[key]
        else:
            # the key is in b dictionary, take the value from there.
            c[key] = b[key]
    return c

def pm2r_dsd_py2(a, b):
    return dict((k, a[k] if k in a else b[k]) for k in set(a.keys()) ^ set(b.keys()))

#def elmex_dsd_py2(a, b):
    #symm_diff = set(a) ^ set(b)
    #return dict((k, v) for k, v in a.items() + b.items() if k in symm_diff) 

def raymond_dsd(a, b):
    c = a.copy()
    c.update(b)
    for k in (a.keys() & b.keys()):
        del c[k]
    return c

def inbar_dsd_short(a, b):
    return {k: a[k] if k in a else b[k] for k in  
        set(a.keys()).symmetric_difference(b.keys())}

def pm2r_dsd_py3(a, b):
    return {k: a[k] if k in a else b[k] for k in a.keys() ^ b.keys()}

def evkounis_dsd(a, b):
    res = {k:v for k, v in a.items() if k not in b}
    res.update({k:v for k, v in b.items() if k not in a})
    return res

def elmex_dsd_py3(a, b):
    symm_diff = set(a) ^ set(b)
    return {k: v for k, v in list(a.items()) + list(b.items()) if k in symm_diff} 

funcs = (
    inbar_dsd_long,
    pm2r_dsd_py2,
    #elmex_dsd_py2,
    raymond_dsd,
    inbar_dsd_short,
    pm2r_dsd_py3,
    evkounis_dsd,
    elmex_dsd_py3,
)

# ----------------------------------------------------------------------

# Random key strings
all_keys = [''.join(t) for t in product(ascii_letters, repeat=3)]
shuffle(all_keys)

def make_dicts(size, shared):
    ''' Make a pair of dicts of length `size`, with random key strings.
        `shared` is a real number 0 <= shared <= 1 giving the approximate 
        ratio of shared keys.
    '''
    a, b = [], []
    keys = iter(all_keys)
    shared_count = 0
    for i in range(size):
        ka = next(keys)
        if random() < shared:
            kb = ka
            shared_count += 1
        else:
            kb = next(keys)
        a.append((ka, i))
        b.append((kb, i))
    shuffle(a)
    shuffle(b)
    return dict(a), dict(b), shared_count

def verify(a, b):
    ''' Verify that all functions return the same result '''
    results = [func(a, b) for func in funcs]
    last = results[-1]
    print(all(last == u for u in results[:-1]))

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import a, b, ' + fname
        cmd = '{0}(a, b)'.format(fname)
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:16} {1}'.format(fname, result))

# ----------------------------------------------------------------------

print('Verifying')
size = 1000
a, b, shared_count = make_dicts(size, 0.1)
print('size: {0}, shared count: {1}'.format(size, shared_count))
verify(a, b)

# Timeit tests
reps = 3
fmt = '\nsize: {0}, shared count: {1}, loops: {2}'
for shared in (0.1, 0.25, 0.5, 0.75, 0.9):
    print('\nSHARED: {0:0.2f}'.format(shared))
    #for size in (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000):
    for size in (10, 100, 1000, 10000):
        a, b, shared_count = make_dicts(size, shared)
        loops = 100000 // size
        print(fmt.format(size, shared_count, loops))
        time_test(loops, reps)    

output

Verifying
size: 1000, shared count: 100
True

SHARED: 0.10

size: 10, shared count: 1, loops: 10000
raymond_dsd      [0.13777699099955498, 0.13792390800153953, 0.1381044740010111]
evkounis_dsd     [0.23560065399942687, 0.23752641000100994, 0.2455631840020942]
pm2r_dsd_py3     [0.23770248700020602, 0.23880975800057058, 0.24221741200017277]
inbar_dsd_long   [0.25206301800062647, 0.285963577000075, 0.28780356199786183]
inbar_dsd_short  [0.2636144610005431, 0.2653795980004361, 0.2666834120027488]
elmex_dsd_py3    [0.3290278729982674, 0.33175632400161703, 0.3384615989998565]
pm2r_dsd_py2     [0.3978280019982776, 0.43710133700005827, 0.4523775029992976]

size: 100, shared count: 14, loops: 1000
raymond_dsd      [0.09872918600012781, 0.09888040100122453, 0.10413656799937598]
evkounis_dsd     [0.1804931380029302, 0.1811683220003033, 0.18133216399655794]
pm2r_dsd_py3     [0.20522897000046214, 0.20773609400202986, 0.20979003499815008]
inbar_dsd_short  [0.21217649699974572, 0.21281453499977943, 0.21295483400172088]
inbar_dsd_long   [0.22985933599920827, 0.23097444899758557, 0.24446944000010262]
elmex_dsd_py3    [0.24242248500013375, 0.24477665499944123, 0.24785449900082313]
pm2r_dsd_py2     [0.3103436530000181, 0.31146229099977063, 0.3152951789998042]

size: 1000, shared count: 94, loops: 100
raymond_dsd      [0.10726087399962125, 0.10726979699757067, 0.10853421000138042]
evkounis_dsd     [0.19798667299983208, 0.19957152200004202, 0.20145120699817198]
pm2r_dsd_py3     [0.24767412599976524, 0.25033419099781895, 0.25519442899894784]
inbar_dsd_long   [0.25753367499783053, 0.259813735003263, 0.2615334299989627]
inbar_dsd_short  [0.25835196700063534, 0.2647503340012918, 0.26879757099959534]
elmex_dsd_py3    [0.3065065359987784, 0.3129320820007706, 0.3159641370002646]
pm2r_dsd_py2     [0.32748841799912043, 0.34595297499981825, 0.3797209490003297]

size: 10000, shared count: 987, loops: 10
raymond_dsd      [0.2801321059996553, 0.2831085340003483, 0.28407657299976563]
evkounis_dsd     [0.36119127300116816, 0.36392319399965345, 0.36926983400189783]
pm2r_dsd_py3     [0.5073807749977277, 0.5122791090034298, 0.5579565990010451]
inbar_dsd_short  [0.5086212060014077, 0.5168500030013092, 0.5182715480004845]
inbar_dsd_long   [0.602521363998676, 0.6031914080012939, 0.6047401769974385]
pm2r_dsd_py2     [0.6753699099972437, 0.6772755890015105, 0.6782451350009069]
elmex_dsd_py3    [0.7430517110005894, 0.7464511920006771, 0.7468688779990771]

SHARED: 0.25

size: 10, shared count: 3, loops: 10000
raymond_dsd      [0.1376171269985207, 0.13765478899949812, 0.13801490599871613]
pm2r_dsd_py3     [0.20131645299989032, 0.20166713100115885, 0.20322838700303691]
inbar_dsd_long   [0.20759937799812178, 0.2079929980027373, 0.21979623799779802]
evkounis_dsd     [0.2186124869986088, 0.2202955180000572, 0.223359776999132]
inbar_dsd_short  [0.23444793200178538, 0.23780764999901294, 0.23976211099943612]
elmex_dsd_py3    [0.3178573650002363, 0.3193927319989598, 0.32410190099835745]
pm2r_dsd_py2     [0.3520881920012471, 0.3543025139988458, 0.3581208620016696]

size: 100, shared count: 23, loops: 1000
raymond_dsd      [0.10508492400185787, 0.10563860000183922, 0.10888238600091427]
evkounis_dsd     [0.15686738300064462, 0.15824111300025834, 0.15863642399926903]
pm2r_dsd_py3     [0.1829918709991034, 0.184900373002165, 0.18732255400027498]
inbar_dsd_short  [0.18875792199833086, 0.19031438200181583, 0.19102797700179508]
inbar_dsd_long   [0.21139359699736815, 0.22990316799769062, 0.2418856490003236]
elmex_dsd_py3    [0.22641843899691594, 0.2265430750012456, 0.23143781299950206]
pm2r_dsd_py2     [0.2681290770015039, 0.2703527909980039, 0.27255326500016963]

size: 1000, shared count: 263, loops: 100
raymond_dsd      [0.10895683100170572, 0.11233176399764488, 0.11593639900092967]
evkounis_dsd     [0.17859331599902362, 0.17949835600302322, 0.18466946999978973]
pm2r_dsd_py3     [0.2147589500018512, 0.21515577800164465, 0.21701817199937068]
inbar_dsd_long   [0.21823484400010784, 0.2254721450008219, 0.22556141600216506]
inbar_dsd_short  [0.22114897099891095, 0.22157548800169025, 0.22668778500155895]
pm2r_dsd_py2     [0.2780861230021401, 0.27864550599770155, 0.28336624599978677]
elmex_dsd_py3    [0.28186336900034803, 0.2837228559983487, 0.29606761199829634]

size: 10000, shared count: 2480, loops: 10
raymond_dsd      [0.278912030000356, 0.28916871899855323, 0.2898256120024598]
evkounis_dsd     [0.33290919899809523, 0.3355702890003158, 0.3366183610014559]
pm2r_dsd_py3     [0.4445611189985357, 0.45341551800083835, 0.4544847100005427]
inbar_dsd_short  [0.4466933030016662, 0.4632708070021181, 0.48025122500257567]
inbar_dsd_long   [0.5405201060020772, 0.5567013979998592, 0.5911358039993502]
pm2r_dsd_py2     [0.586115582002094, 0.600204237998696, 0.6029243630000565]
elmex_dsd_py3    [0.7058123890019488, 0.7067292030005774, 0.7115862030004791]

SHARED: 0.50

size: 10, shared count: 6, loops: 10000
raymond_dsd      [0.15135921700129984, 0.1533788429987908, 0.17841531700105406]
pm2r_dsd_py3     [0.15311526600271463, 0.15356177799912984, 0.15895434199774172]
inbar_dsd_long   [0.16137141400031396, 0.1618921000008413, 0.17238240400183713]
inbar_dsd_short  [0.1808154470018053, 0.18266997299724608, 0.1863039679992653]
evkounis_dsd     [0.18221631199776311, 0.18251911100014695, 0.18520446800175705]
pm2r_dsd_py2     [0.2700158850020671, 0.2743520539988822, 0.28932957600045484]
elmex_dsd_py3    [0.28983224500188953, 0.2912340100010624, 0.2933940119983163]

size: 100, shared count: 51, loops: 1000
raymond_dsd      [0.10294843999872683, 0.10327848499946413, 0.10685922099946765]
evkounis_dsd     [0.13586801600104081, 0.13726477299860562, 0.142784658997698]
pm2r_dsd_py3     [0.1435330319982313, 0.14396326799760573, 0.14474550500017358]
inbar_dsd_short  [0.15043617100309348, 0.15080328300246038, 0.1527250040016952]
inbar_dsd_long   [0.1667091649978829, 0.17330403699816088, 0.17601154400108499]
pm2r_dsd_py2     [0.20728979400155367, 0.20776088099955814, 0.2079896369978087]
elmex_dsd_py3    [0.21078268400015077, 0.2123827169998549, 0.21517163300086395]

size: 1000, shared count: 491, loops: 100
raymond_dsd      [0.11212847299975692, 0.11414236799828359, 0.11498476199994911]
evkounis_dsd     [0.14059560900204815, 0.14112727400060976, 0.150327464001748]
pm2r_dsd_py3     [0.14733014900048147, 0.15143406900097034, 0.1542897660001472]
inbar_dsd_short  [0.15075810700000147, 0.151888833999692, 0.15750856500017107]
inbar_dsd_long   [0.16265833400029805, 0.16367860500031384, 0.17333104299905244]
pm2r_dsd_py2     [0.1993612549995305, 0.19947306600079173, 0.20446195700060343]
elmex_dsd_py3    [0.24682135100010782, 0.24862800600021728, 0.25419495800088043]

size: 10000, shared count: 4938, loops: 10
evkounis_dsd     [0.2519790539990936, 0.2573451700009173, 0.2603536310016352]
raymond_dsd      [0.2875208960031159, 0.2887761790007062, 0.30461744100102806]
pm2r_dsd_py3     [0.3364586130010139, 0.342166794998775, 0.3465069459998631]
inbar_dsd_short  [0.3490315640010522, 0.6202766900023562, 0.7155317880024086]
inbar_dsd_long   [0.42809327600116376, 0.4363977649991284, 0.4812496539998392]
pm2r_dsd_py2     [0.46369219400003203, 0.46809901899905526, 0.4706174610000744]
elmex_dsd_py3    [0.6603999830003886, 0.6629649060014344, 0.6652154759976838]

SHARED: 0.75

size: 10, shared count: 7, loops: 10000
pm2r_dsd_py3     [0.14004066000052262, 0.14024711000092793, 0.1411744200013345]
inbar_dsd_long   [0.1457400300023437, 0.1463650259975111, 0.17371471199658117]
raymond_dsd      [0.1495657380000921, 0.15151091000007, 0.1532108950013935]
inbar_dsd_short  [0.16798981899773935, 0.1684792589985591, 0.17371860500134062]
evkounis_dsd     [0.18283682300170767, 0.18351536599948304, 0.18536045300061232]
pm2r_dsd_py2     [0.24651207700298983, 0.24725952299922938, 0.3011513509991346]
elmex_dsd_py3    [0.27965197500088834, 0.2817374969999946, 0.28211258000010275]

size: 100, shared count: 83, loops: 1000
evkounis_dsd     [0.10071835599956103, 0.10109729699979653, 0.1036734150002303]
inbar_dsd_long   [0.10147314599817037, 0.1017698140021821, 0.11575333300061175]
pm2r_dsd_py2     [0.1257392070001515, 0.14690794800117146, 0.2597000979985751]
pm2r_dsd_py3     [0.16547765900031663, 0.17877282599874889, 0.1817621379996126]
elmex_dsd_py3    [0.18176361400037422, 0.18339519599976484, 0.18422297999859438]
inbar_dsd_short  [0.18878075899920077, 0.1932126639985654, 0.201184026998817]
raymond_dsd      [0.23026226100046188, 0.2342098570006783, 0.24134657600006904]

size: 1000, shared count: 751, loops: 100
inbar_dsd_short  [0.0925550639985886, 0.09375216300031752, 0.09518678500171518]
pm2r_dsd_py3     [0.09365715600142721, 0.0952552939997986, 0.0984138530002383]
raymond_dsd      [0.10659463599949959, 0.10675223399812239, 0.1076178000002983]
inbar_dsd_long   [0.10787330499806558, 0.10813268299898482, 0.1191909779990965]
evkounis_dsd     [0.11020168100003502, 0.11101243599841837, 0.11369209199983743]
pm2r_dsd_py2     [0.1283391249999113, 0.12977415000204928, 0.13450328500039177]
elmex_dsd_py3    [0.20605224600149086, 0.20856778099914663, 0.21231961700323154]

size: 10000, shared count: 7525, loops: 10
evkounis_dsd     [0.19238157699874137, 0.19369199399807258, 0.20787687100164476]
pm2r_dsd_py3     [0.237352975000249, 0.2393961540001328, 0.24592895499881706]
inbar_dsd_short  [0.24010049900243757, 0.24383026600116864, 0.246290401002625]
inbar_dsd_long   [0.31666912799846614, 0.3353785740000603, 0.3762496050003392]
raymond_dsd      [0.3268343650015595, 0.3270019219999085, 0.32956799900057376]
pm2r_dsd_py2     [0.3330148269997153, 0.34052117800092674, 0.3426254549995065]
elmex_dsd_py3    [0.6130798710000818, 0.6139247349965444, 0.6146237579996523]

SHARED: 0.90

size: 10, shared count: 10, loops: 10000
pm2r_dsd_py3     [0.09191049900255166, 0.09203974899719469, 0.09560386399971321]
inbar_dsd_long   [0.09304381299807574, 0.09397280899793259, 0.10319281500051147]
inbar_dsd_short  [0.0980829280015314, 0.09835117700276896, 0.0987546550022671]
raymond_dsd      [0.14094099900103174, 0.14119526200011023, 0.14634641500015277]
evkounis_dsd     [0.14480078699853038, 0.1466599049999786, 0.14705315900209825]
pm2r_dsd_py2     [0.16137886599972262, 0.16186897499937913, 0.1626489610025601]
elmex_dsd_py3    [0.24912584599951515, 0.2519607159993029, 0.2550744569998642]

size: 100, shared count: 88, loops: 1000
pm2r_dsd_py3     [0.08017906299937749, 0.08175948099960806, 0.08336899599817116]
inbar_dsd_short  [0.08394136000060826, 0.08467326000027242, 0.08476182100275764]
inbar_dsd_long   [0.09241838099842425, 0.0929719669984479, 0.10157853300188435]
evkounis_dsd     [0.09769711500121048, 0.09770239999852492, 0.10219176600003266]
pm2r_dsd_py2     [0.11295593600152642, 0.11317849099941668, 0.11382339899864746]
raymond_dsd      [0.11950065099881613, 0.11954410699763685, 0.16439275900120265]
elmex_dsd_py3    [0.17893833099878975, 0.18027151500064065, 0.18072834000122384]

size: 1000, shared count: 896, loops: 100
pm2r_dsd_py3     [0.06560493199867778, 0.06627220900190878, 0.06649829500020132]
inbar_dsd_short  [0.067232484001579, 0.06832705600027111, 0.06892605100074434]
inbar_dsd_long   [0.07928322799853049, 0.0793153419981536, 0.0874185499997111]
pm2r_dsd_py2     [0.08986150900091161, 0.09258468600091874, 0.09545781900305883]
evkounis_dsd     [0.09216968399778125, 0.09272978199805948, 0.09716289000061806]
raymond_dsd      [0.11052805100189289, 0.11131704600120429, 0.11136766299750889]
elmex_dsd_py3    [0.18965840600139927, 0.1898866600022302, 0.19107911399987643]

size: 10000, shared count: 9011, loops: 10
evkounis_dsd     [0.1584843410018948, 0.16192917299849796, 0.16836377900108346]
pm2r_dsd_py3     [0.1789340169998468, 0.17990425000243704, 0.1874260629992932]
inbar_dsd_short  [0.18104806900009862, 0.18631987900153035, 0.18891330599944922]
inbar_dsd_long   [0.2561770180000167, 0.2672927259991411, 0.27309057399907033]
pm2r_dsd_py2     [0.26508888299940736, 0.2661178109992761, 0.2812051930013695]
raymond_dsd      [0.3262405569985276, 0.32729987999846344, 0.3313657439975941]
elmex_dsd_py3    [0.5737760600022739, 0.5791283889993792, 0.5847248999998556]
PM 2Ring
  • 50,023
  • 5
  • 64
  • 150
  • I'm going to accept this answer, because it includes not only all the methods discussed here, but also the timing, so each user could choose the method best suited for their expected situation. – Inbar Rose Mar 12 '17 at 08:12
  • @InbarRose Thanks, I appreciate that, even though I don't get any points for this answer since it's community wiki. – PM 2Ring Mar 12 '17 at 08:47
  • what are "the various algorithms" ?? those provided in the other answers? can I get a tl;dr - which one is the preferred? – TT-- Apr 04 '19 at 16:42
  • @TT-- Yes, this program tests all the algorithms on this page, and identifies them by the name of the person who submitted them. Sorry, there is no tl;dr. Which algorithm is fastest varies depending on the amount of data and the proportion of items they share. It also depends on if you're running Python 2 or Python 3. If you want to know which is fastest, I recommend running this code on your system, adjusting the `size` & `shared` ranges to match the ranges of your typical data. If speed isn't a priority, just use the algorithm that you find the most readable. – PM 2Ring Apr 05 '19 at 05:10
4

A symmetric difference is equal to the union minus the intersection:

>>> a = {'a': 1, 'b':2}
>>> b = {'b': 2, 'c':3}
>>> c = a.copy()
>>> c.update(b)
>>> for k in (a.keys() & b.keys()):
        del c[k]

>>> c
{'a': 1, 'c': 3}
Raymond Hettinger
  • 182,864
  • 54
  • 321
  • 419
  • How about maybe `c= {**a, **b}` for Python3.5+? – Chris_Rands Mar 07 '17 at 14:50
  • 1
    @Chris_Rands Yes, that works. And of course `dict(**a, **b)` won't work because it will raise TypeError on multiple values for the same key. – PM 2Ring Mar 07 '17 at 14:58
  • @PM2Ring Thanks, yes, how to this compare to your answer? I would naively prefer yours but I feel like because Raymond suggested this, it must somehow be the best approach! – Chris_Rands Mar 07 '17 at 15:03
  • 1
    @Chris_Rands I suspect that Raymond may know a thing or two about efficient dict operations. :) It would be interesting to see timeit results for the different algorithms on this page, with various sized inputs and proportions of duplicated keys. And of course it'd be good to see what speed variations there are between Python 3 and 2. – PM 2Ring Mar 07 '17 at 15:43
  • FWIW, your code is the clear winner when the proportion of shared keys is <= 0.5; see my `timeit` code community wiki answer for details. – PM 2Ring Mar 08 '17 at 14:37
4

The dict.keys() View object is set-like, and it supports the ^ symmetric_difference operator.

From the docs:

Keys views are set-like since their entries are unique and hashable. [...] For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^).

To handle the problem with false-ish values that arises by using the or expression in Inbar Rose's original solution we can just use an in test; if the key isn't in a it must be in b, so we only need 1 in test.

def dict_symmetric_difference(a, b):
    return {k: a[k] if k in a else b[k] for k in a.keys() ^ b.keys()}

a = {'a': 1, 'b':2, 'd': ''}
b = {'b': 2, 'c':3}
print(dict_symmetric_difference(a, b))   

output

{'d': '', 'c': 3, 'a': 1}

Python 2 doesn't have dictionary view objects, so in Python 2 you need to wrap the .keys() calls with set(). Versions of Python prior to 2.7 don't support the dictionary comprehension, but you can pass a generator expression to the dict() constructor, unless you're running a really ancient version of Python.

Here's a version that will run correctly on Python 2.4+:

def dict_symmetric_difference(a, b):
    return dict((k, a[k] if k in a else b[k]) for k in set(a.keys()) ^ set(b.keys()))

We can avoid making two calls to set by using the symmetric_difference method instead of the ^ operator, since the non-operator versions of the various set operations will accept any iterable as an argument. So we can do

set(a.keys()).symmetric_difference(b.keys())

instead of

set(a.keys()) ^ set(b.keys())

As Martijn Pieters has pointed out in the comments, dictionary view objects have been backported to Python 2.7. The syntax is slightly different than in Python 3 to avoid breaking code which uses the .keys, .values, and .items methods. To get the keys view object use the .viewkeys method. mydict.viewkeys() is much more efficient than set(mydict.keys()). Dictionary view objects also have the benefit that they're dynamic, i.e., they reflect any changes made to the dictionary, whereas set(mydict.keys()) has to be called again if any changes are made to mydict. That's not an issue for this code, but it's a great feature when you do need it.

PM 2Ring
  • 50,023
  • 5
  • 64
  • 150
1

This is how I would do it:

A = {'a': 1, 'b': 2}
B = {'b': 2, 'c': 3}


def dict_symmetric_difference(dict_A, dict_B):
    res = {k:v for k, v in dict_A.items() if k not in dict_B}
    res.update({k:v for k, v in dict_B.items() if k not in dict_A})
    return res

print(dict_symmetric_difference(A, B))  # {'a': 1, 'c': 3}
Ma0
  • 14,004
  • 2
  • 29
  • 59
0

Very short

A = {'a': 1, 'b': 2}
B = {'b': 2, 'c': 3}

print dict((k, v) for k, v in A.items() + B.items() if k in set(A) ^ set(B))

In case you feel uncomfortable about speed and suspect Python to do set(A) ^ set(B) at each iteration you can just use this code:

symm_diff = set(A) ^ set(B)

print dict((k, v) for k, v in A.items() + B.items() if k in symm_diff) 
Elmex80s
  • 3,258
  • 1
  • 13
  • 22
  • 2
    There is a problem with the use of `or`. See the comment form @Rawing in the answer from OP. – Ma0 Mar 07 '17 at 14:00
  • 1
    Aren't you calculating `if k in set(A) ^ set(B))` for each iteration? – Inbar Rose Mar 08 '17 at 07:59
  • @InbarRose To be honest: I am not sure. List comprehension is a strong feature of Python so I could imagine they optimized for this. However I will update my answer mentioning it. – Elmex80s Mar 08 '17 at 08:42
  • 2
    No, `if k in set(A) ^ set(B))` doesn't get optimized out, the Python compiler doesn't perform that sort of optimization because it has no way of knowing that `A` and `B` won't change in the loop. – PM 2Ring Mar 08 '17 at 14:31
0

If you are using Python 3, This is a more compact solution than the others:

{k: v for k, v in a.items() ^ b.items()}

Please note that it does not work if there are unhashable types inside those dictionaries.

Andrea
  • 3,322
  • 3
  • 28
  • 45