3

I have a list of dict with keys ['name','content','summary',...]. All the values are strings. But some values are None. I need to remove all the new lines in content, summary and some other keys. So, I do this:

...
...
for item in item_list:
    name = item['name']
    content = item['content']
    if content is not None: content = content.replace('\n','')
    summary = item['summary']
    if summary is not None: summary = summary.replace('\n','')
    ...
    ...
...
...

I somewhat feel that the if x is not None: x = x.replace('\n','') idiom not so intelligent or clean. Is there a more "pythonic" or better way to do it?

Thanks.

Jeff Tratner
  • 13,792
  • 4
  • 41
  • 63
bdhar
  • 17,821
  • 16
  • 62
  • 84
  • Absence of a key in a dictionary encodes "nothing here" as well or better than None. – msw Jun 11 '12 at 00:51

8 Answers8

7

The code feels unwieldy to you, but part of the reason is because you are repeating yourself. This is better:

def remove_newlines(text):
    if text is not None:
        return text.replace('\n', '')

for item in item_list:
    name = item['name']
    content = remove_newlines(item['content'])
    summary = remove_newlines(item['summary'])
Michael Hoffman
  • 27,420
  • 6
  • 55
  • 80
6

If you are going to use sentinel values (None) then you will be burdened with checking for them.

There are a lot of different answers to your question, but they seem to be missing this point: don't use sentinel values in a dictionary when the absence of an entry encodes the same information.

For example:

bibliography = [
    { 'name': 'bdhar', 'summary': 'questioner' },
    { 'name': 'msw', 'content': 'an answer' },
]

then you can

for article in bibliography:
    for key in article:
        ...

and then your loop is nicely ignorant of what keys, if any, are contained in a given article.

In reading your comments, you claim that you are getting the dict from somewhere else. So clean it of junk values first. It is much more clear to have a cleaning step then it is to carry their misunderstanding through your code.

msw
  • 40,500
  • 8
  • 77
  • 106
5

Python has a ternary operator, so one option is to do this in a more natural word order:

content = content.replace('\n', '') if content is not None else None

Note that if "" and None are equivalent in your case (which appears to be so), you can shorten it to just if content, as non-empty strings evaluate to True.

content = content.replace('\n', '') if content else None

This also follows the Python idiom of explicit is better than implicit. This shows someone following the code that the value can be None very clearly.

It's worth noting that if you repeat this operation a lot, it might be worth encapsulating it as a function.

Another idiom in Python is ask for forgiveness, not permission. So you could simply use try and except the AttributeError that follows, however, this becomes a lot more verbose in this case, so it's probably not worth it, especially as the cost of the check is so small.

try:
    content = content.replace('\n', '')
except AttributeError:
    content = None
    #pass #Also an option, but as mentioned above, explicit is generally clearer than implicit.
Gareth Latty
  • 77,968
  • 15
  • 168
  • 174
  • While you are on the point of exception handling, it is noteworthy that `try` is cheap, but `except` is relatively expensive. – Matthew Schinckel Jun 11 '12 at 01:13
  • @MatthewSchinckel The presumption is that where you are catching an exception it is a rare case. If that isn't so, then yes, the cost of the `except` can be an issue. – Gareth Latty Jun 11 '12 at 01:24
  • 2
    It is worth noting that many consider the ternary operator unPythonic despite being in the language http://stackoverflow.com/questions/394809/python-ternary-operator Even http://www.python.org/dev/peps/pep-0308/ has an air of resignation about it. – msw Jun 11 '12 at 01:33
  • @Lattyware yes, that's a nice way to put it. – Matthew Schinckel Jun 11 '12 at 04:05
  • @msw I must say I really don't agree - I think that it's much nicer than trying to cram an `if` statement onto one line. – Gareth Latty Jun 11 '12 at 09:36
  • There are certainly reasonable arguments for both sides as was captured in that SO link. I agree that the one-line `if` makes for confusing reading. (my comment was just a comment, if I'd meant "-1" I would have said so) – msw Jun 11 '12 at 09:42
  • @msw Of course, I didn't take offence - it's a perfectly valid comment, and I was unaware of the disagreement surrounding the ternary operator, I was just saying I like the way it writes/reads. – Gareth Latty Jun 11 '12 at 09:47
2

One possibility is to use the empty string instead of None. This is not a fully general solution, but in many cases if your data is all of a single type, there will be a sensible "null" value other than None (empty string, empty list, zero, etc.). In this case it looks like you could use the empty string.

BrenBarn
  • 210,788
  • 30
  • 364
  • 352
2

Try:

if content: content = content.replace('\n','')

--

if content will (almost1) always be True as long as content contains anything except for 0, False, or None.


1As Lattyware correctly points out in the comments, this is not strictly True. There are other things that will evaluate to False in an if statement, for example, an empty list. See the link provided in the comment below.

  • Your final line isn't technically true. Empty containers also evaluate to `False`, as can anything defining `__nonzero__()` in 2.x or `__bool__()` in 3.x, and some other things. [The docs explain in full](http://docs.python.org/library/stdtypes.html#truth-value-testing). – Gareth Latty Jun 11 '12 at 00:37
2

The empty string evaluates to False in Python, so the Pythonic way is if content:.

In [2]: bool("")
Out[2]: False

In [3]: bool("hello")
Out[3]: True

Side note but you can make your code a little clearer:

name, content = item["name"], item["content"]

And:

content = content.replace('\n','') if content else None
Skurmedel
  • 19,731
  • 5
  • 47
  • 65
2

You might also consider abstracting some of your if clauses into a separate function:

def remove_newlines(mystr):
    if mystr:
        mystr = mystr.replace('\n')
    return mystr

(edited to remove the over-complicated solution with dictionaries, etc)

Jeff Tratner
  • 13,792
  • 4
  • 41
  • 63
  • 1
    I feel this is a bit of an over-engineered solution. Unless you do this a lot throughout your code, I would just use the `remove_newlines()` function and call it there. – Gareth Latty Jun 11 '12 at 00:33
  • @Lattyware that's a reasonable critique. What do you think about the second version (just updated it)? – Jeff Tratner Jun 11 '12 at 00:35
  • The `identity` function caused me to have to read it twice to see that it really was a NOP. I second "over-engineered". – msw Jun 11 '12 at 00:55
  • I changed the post to just focus on the remove_newlines function and leave everything else out. Thanks for the commentary @msw and Lattyware – Jeff Tratner Jun 11 '12 at 00:58
1

I think that the "pythonic" thing is to use the fact that None will evaluate to False in an if statement. So you can just say:

if content: content = content.replace('\n','')
ChipJust
  • 1,320
  • 11
  • 20