I need to perform case insensitive string comparisons in python in sets and dictionary keys. Now, to create sets and dict subclasses that are case insensitive proves surprisingly tricky (see: Case insensitive dictionary for ideas, note they all use lower - hey there's even a rejected PEP, albeit its scope is a bit broader). So I went with creating a case insensitive string class (leveraging this answer by @AlexMartelli):
class CIstr(unicode):
"""Case insensitive with respect to hashes and comparisons string class"""
#--Hash/Compare
def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
if isinstance(other, basestring):
return self.lower() == other.lower()
return NotImplemented
def __ne__(self, other): return not (self == other)
def __lt__(self, other):
if isinstance(other, basestring):
return self.lower() < other.lower()
return NotImplemented
def __ge__(self, other): return not (self < other)
def __gt__(self, other):
if isinstance(other, basestring):
return self.lower() > other.lower()
return NotImplemented
def __le__(self, other): return not (self > other)
I am fully aware that lower
is not really enough to cover all cases of string comparisons in unicode but I am refactoring existing code that used a much clunkier class for string comparisons (memory and speed wise) which anyway used lower() - so I can amend this on a later stage - plus I am on python 2 (as seen by unicode
). My questions are:
did I get the operators right ?
is this class enough for my purposes, given that I take care to construct keys in dicts and set elements as
CIstr
instances - my purposes being checking equality, containment, set differences and similar operations in a case insensitive way. Or am I missing something ?is it worth it to cache the lower case version of the string (as seen for instance in this ancient python recipe: Case Insensitive Strings). This comment suggests that not - plus I want to have construction as fast as possible and size as small as possible but people seem to include this.
Python 3 compatibility tips are appreciated !
Tiny demo:
d = {CIstr('A'): 1, CIstr('B'): 2}
print 'a' in d # True
s = set(d)
print {'a'} - s # set([])