54

I have been given the task to remove all non numeric characters including spaces from a either text file or string and then print the new result next to the old characters for example:

Before:

sd67637 8

After:

676378

As i am a beginner i do not know where to start with this task. Please Help

Luke Singham
  • 1,194
  • 2
  • 14
  • 32
Obcure
  • 861
  • 3
  • 10
  • 22
  • 1
    Possible duplicate of [Remove characters except digits from string using Python?](https://stackoverflow.com/q/1450897/608639) – jww Jun 30 '19 at 13:13

4 Answers4

92

The easiest way is with a regexp

import re
a = 'lkdfhisoe78347834 (())&/&745  '
result = re.sub('[^0-9]','', a)

print result
>>> '78347834745'
mar mar
  • 978
  • 6
  • 9
24

Loop over your string, char by char and only include digits:

new_string = ''.join(ch for ch in your_string if ch.isdigit())

Or use a regex on your string (if at some point you wanted to treat non-contiguous groups separately)...

import re
s = 'sd67637 8' 
new_string = ''.join(re.findall(r'\d+', s))
# 676378

Then just print them out:

print(old_string, '=', new_string)
jamylak
  • 111,593
  • 23
  • 218
  • 220
Jon Clements
  • 124,071
  • 31
  • 219
  • 256
10

There is a builtin for this.

string.translate(s, table[, deletechars])

Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed.

>>> import string
>>> non_numeric_chars = ''.join(set(string.printable) - set(string.digits))
>>> non_numeric_chars = string.printable[10:]  # more effective method. (choose one)
'sd67637 8'.translate(None, non_numeric_chars)
'676378'

Or you could do it with no imports (but there is no reason for this):

>>> chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
>>> 'sd67637 8'.translate(None, chars)
'676378'
Community
  • 1
  • 1
Inbar Rose
  • 35,719
  • 22
  • 80
  • 120
  • This should be the top answer. – akhan Jan 11 '17 at 07:44
  • Not really `>>> 's.,d67637 8'.translate(None, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ')` yields `'.,676378'` – Darth Kotik Jan 11 '17 at 08:47
  • @DarthKotik good point, OP didn't mentioned anything about special characters, but that is easy to solve. Check my edit. – Inbar Rose Jan 11 '17 at 08:57
  • @InbarRose it'll work but as soon as you wan to use some cyrillic symbol or something Chinese it'll fail. This solution is good as far as you know exactly what set of symbols might present in your field which is not really good. – Darth Kotik Jan 11 '17 at 09:05
  • 3
    @DarthKotik OP had no mention of special characters or encoding. Regardless, string.translate can solve all of those problems with the correct input. Much like every problem, it should be solved one step at a time. And in Agile development there is no need for premature optimization. The question was simple, the answer is simple. If you want to get into minutia we will be here all day. – Inbar Rose Jan 11 '17 at 09:39
  • Not Python 3 compatible. Very outdated answer. – Elia Iliashenko Aug 29 '18 at 11:34
  • @InbarRose Please update answer for python 3 (https://stackoverflow.com/a/41708804/828885) – akhan Mar 04 '19 at 17:55
1

You can use string.ascii_letters to identify your non-digits:

from string import *

a = 'sd67637 8'
a = a.replace(' ', '')

for i in ascii_letters:
    a = a.replace(i, '')

In case you want to replace a colon, use quotes " instead of colons '.

Saullo G. P. Castro
  • 49,101
  • 22
  • 160
  • 223