Best way to strip punctuation from a string

Question

It seems like there should be a simpler way than:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

Is there?

Seems pretty straightforward to me. Why do you want to change it? If you want it easier just wrap what you just wrote in a function. — Hannes Ovrén, Nov 05 '08 at 17:38
Well, it just seemed kind of hackish to be be using kind of a side effect of str.translate to be doing the work. I was thinking there might be something more like str.strip(chars) that worked on the entire string instead of just the boundaries that I had missed. — Lawrence Johnston, Nov 05 '08 at 18:00
Depends on the data too. Using this on data where there are server names with underscores as part of the name (pretty common some places) could be bad. Just be sure that you know the data and what it conatains or you could end up with a subset of the clbuttic problem. — EBGreen, Nov 05 '08 at 18:10
Depends also on what you call punctuation. "`The temperature in the O'Reilly & Arbuthnot-Smythe server's main rack is 40.5 degrees.`" contains exactly ONE punctuation character, the second "." — John Machin, Mar 08 '10 at 21:49
I'm surprised no one mentioned that `string.punctuation` doesn't include non-English punctuation at all. I'm thinking about 。，！？：×“”〟, and so on. — Clément, Jan 03 '13 at 15:40
@JohnMachin you're forgetting that [`' '` is punctuation](https://en.wikipedia.org/wiki/Space_(punctuation)). — Wayne Werner, May 01 '17 at 16:42
As of python 3.1 (to at least 3.8.3), you'll want: `str.maketrans("","", string.punctuation)` per [this documentation](https://docs.python.org/3.3/library/stdtypes.html?highlight=maketrans#str.maketrans) with the change [documented in 3.1](https://docs.python.org/3/whatsnew/3.1.html) — Brownbat, May 02 '20 at 16:16
Most of the discussion here is Python 2, [this question is similar](https://stackoverflow.com/questions/11066400/remove-punctuation-from-unicode-formatted-strings/21635971#21635971) but has superior Python 3 answers. — David Jones, Nov 10 '20 at 11:13

score 1084 · Accepted Answer · edited Mar 05 '19 at 21:07

1084

From an efficiency perspective, you're not going to beat

s.translate(None, string.punctuation)

For higher versions of Python use the following code:

s.translate(str.maketrans('', '', string.punctuation))

It's performing raw string operations in C with a lookup table - there's not much that will beat that but writing your own C code.

If speed isn't a worry, another option though is:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won't perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

Timing code:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

edited Mar 05 '19 at 21:07

Ashish Cherian

307
2
11

answered Nov 05 '08 at 18:36

Brian

107,377
28
104
109

33

Thanks for the timing info, I was thinking about doing something like that myself, but yours is better written than anything I would have done and now I can use it as a template for any future timing code I want to write:). – Lawrence Johnston Nov 05 '08 at 19:57
30

Great answer. You can simplify it by removing the table. The docs say: "set the table argument to None for translations that only delete characters" (http://docs.python.org/library/stdtypes.html#str.translate) – Alexandros Marinos Jul 01 '11 at 21:24
1

Using a list comprehension for the `''.join()` would make it a little faster, but not fast enough to beat the `regex` or `translate`. See [list comprehension without \[ \], Python](http://stackoverflow.com/a/9061024) for why that is so. – Martijn Pieters Nov 08 '13 at 12:13
3

worth noting too that translate() behaves differently for str and unicode objects, so you need to be sure you're always working with the same datatype, but the approach in this answer works equally well for both, which is handy. – Richard J Jan 16 '15 at 09:35
38

In Python3, `table = string.maketrans("","")` should be replaced with `table = str.maketrans({key: None for key in string.punctuation})`? – SparkAndShine May 13 '16 at 23:36
What's the purpose of doing `set(string.punctuation)`? It only has unique values to begin with. – mlissner Aug 08 '16 at 15:03
2

@mlissner - efficiency. It it's a list/string, you need to do a linear scan to find out whether the letter is in the string. With a set or dictionary though, it'll generally be faster (except for really small strings) since it doesn't have to check every value. – Brian Sep 27 '16 at 14:13
1

@sparkandshine Yes, except you have to map the ordinals of each key to the replacement character, so in Python 3 it would be `s.translate({ord(c): None for c in string.punctuation})`. – Galen Long Jan 26 '17 at 16:49
21

To update the discussion, as of Python 3.6, `regex` is now the most efficient method! It is almost 2x faster than translate. Also, sets and replace are no longer so bad! They are both improved by over a factor of 4 :) – Ryan Soklaski Jul 24 '17 at 22:35
2

In Python 3, the translation table can also be created by `table = str.maketrans('', '', string.punctuation)` https://docs.python.org/3/library/stdtypes.html#str.maketrans – RyanLeiTaiwan Mar 18 '18 at 16:11
Thanks a ton for the time complexity for each and every approach. – Sundeep Pidugu Jul 22 '19 at 08:49
>>> s.translate(None, string.punctuation) Traceback (most recent call last): File "", line 1, in TypeError: str.translate() takes exactly one argument (2 given) in python3, i think this answer needs updating – John D Jan 08 '21 at 23:17
Thanks for ton of infomation. Its helpful – Gaurav Koradiya Mar 25 '21 at 14:18

Eratosthenes · Answer 2 · 2019-09-16T17:36:50.027

172

Regular expressions are simple enough, if you know them.

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

edited Sep 16 '19 at 17:36

answered May 28 '13 at 18:47

Eratosthenes

1,947
1
11
9

4

@Outlier Explanation: replaces not (^) word characters or spaces with the empty string. Be careful though, the \w matches underscore too usually for example. – Matthias Feb 03 '16 at 15:28
5

@SIslam I think it will work with unicode with the unicode flag set, i.e. `s = re.sub(r'[^\w\s]','',s, re.UNICODE)`. Testing it with python 3 on linux it works even without the flag using tamil letters, தமிழ். – Matthias Feb 03 '16 at 15:31
@Matthias I tried the code with Python 3.6.5 on Mac, the Tamil letters output looks a bit different, input தமிழ் becomes தமழ. I have no knowledge about Tamil, not sure if that's expected. – shiouming May 28 '19 at 02:45
@Matthias It gets confused with word boundaries while working with UNICODE Bengali text and gives wrong words no matter UNICODE flag is used or not. – hafiz031 Sep 09 '20 at 13:04

SparkAndShine · Answer 3 · 2019-10-09T00:54:11.213

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python 2 and Python 3. Please refer to other answers for the detailed description.

Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

score 52 · Answer 4 · answered Mar 08 '10 at 15:19

52

myString.translate(None, string.punctuation)

answered Mar 08 '10 at 15:19

pyrou

537
4
2

4

ah, I tried this but it doesn't work in all cases. myString.translate(string.maketrans("",""), string.punctuation) works fine. – Aidan Kane Aug 12 '10 at 12:30
12

Note that for `str` in Python 3, and `unicode` in Python 2, the `deletechars` argument is not supported. – agf Apr 14 '12 at 00:36
2

@agf: you still can [use `.translate()` to remove punctuation even in Unicode and py3k cases](http://stackoverflow.com/a/11066687/4279) using dictionary argument. – jfs Aug 28 '12 at 07:53
4

myString.translate(string.maketrans("",""), string.punctuation) will NOT work with unicode strings (found out the hard way) – Marc Maxmeister Jul 25 '14 at 19:25
2

@MarcMaxson: `myString.translate(str.maketrans("", "", string.punctuation))` does work for Unicode strings on Python 3. Though `string.punctuation` includes only ascii punctuation there. Click [the link in my previous comment](http://stackoverflow.com/a/11066687/4279). It shows how to remove all punctuation (including Unicode one). – jfs Oct 01 '14 at 08:56
55

`TypeError: translate() takes exactly one argument (2 given)` :( – Brian Tingle Apr 23 '15 at 18:35
3

@BrianTingle: look at the Python 3 code in my comment (it passes one argument). [Follow the link, to see Python 2 code that works with unicode](http://stackoverflow.com/a/11066687/4279) and [its Python 3 adaptation](http://stackoverflow.com/a/21635971/4279) – jfs Jul 15 '15 at 17:21

score 32 · Answer 5 · answered Nov 05 '08 at 17:41

32

I usually use something like this:

>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
...
>>> s
'string With Punctuation'

answered Nov 05 '08 at 17:41

S.Lott

359,791
75
487
757

2

An uglified one-liner: `reduce(lambda s,c: s.replace(c, ''), string.punctuation, s)`. – jfs Aug 11 '12 at 12:03
1

great, however doesn't remove some puctuation like longer hyphen – Vladimir Stazhilov Jan 17 '15 at 15:57

score 29 · Answer 6 · edited Apr 02 '20 at 03:54

29

string.punctuation is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

You can generalize and strip other types of characters as well:

''.join(ch for ch in s if category(ch)[0] not in 'SP')

It will also strip characters like ~*+§$ which may or may not be "punctuation" depending on one's point of view.

edited Apr 02 '20 at 03:54

wim

266,989
79
484
630

answered Sep 01 '11 at 09:29

Björn Lindqvist

16,492
13
70
103

3

You could: [`regex.sub(ur"\p{P}+", "", text)`](http://stackoverflow.com/a/11066687/4279) – jfs Aug 11 '12 at 12:07
Unfortunately, things like `~` are not part of the punctuation category. You need to also test for the Symbols category as well. – C.J. Jackson Oct 06 '19 at 04:08

Vinko Vrsalovic · Answer 7 · 2008-11-05T23:20:04.560

27

Not necessarily simpler, but a different way, if you are more familiar with the re family.

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

edited Nov 05 '08 at 23:20

answered Nov 05 '08 at 17:39

Vinko Vrsalovic

244,143
49
315
361

1

Works because string.punctuation has the sequence ,-. in proper, ascending, no-gaps, ASCII order. While Python has this right, when you try to use a subset of string.punctuation, it can be a show-stopper because of the surprise "-". – S.Lott Nov 05 '08 at 17:49
2

Actually, its still wrong. The sequence "\\]" gets treated as an escape (coincidentally not closing the ] so bypassing another failure), but leaves \ unescaped. You should use re.escape(string.punctuation) to prevent this. – Brian Nov 05 '08 at 18:15
1

Yes, I omitted it because it worked for the example to keep things simple, but you are right that it should be incorporated. – Vinko Vrsalovic Nov 05 '08 at 23:21

score 14 · Answer 8 · edited May 23 '17 at 12:02

14

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

To remove (some?) punctuation then, use:

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian's answer (Python 3 version):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

edited May 23 '17 at 12:02

Community

1
1

answered Sep 02 '13 at 09:57

Martijn Pieters

889,049
245
3,507
2,997

To support Unicode, `string.punctuation` is not enough. See [my answer](http://stackoverflow.com/a/11066687/4279) – jfs Sep 30 '14 at 20:57
@J.F.Sebastian: indeed, my answer was just using the same characters as the top-voted one. Added a Python 3 version of your table. – Martijn Pieters Sep 30 '14 at 21:08
the top-voted answer works only for ascii strings. Your answer claims explicitly the Unicode support. – jfs Oct 01 '14 at 08:53
1

@J.F.Sebastian: it works for Unicode strings. It strips ASCII punctuation. I never claimed it strips *all* punctuation. :-) The point was to provide the correct technique for `unicode` objects vs. Python 2 `str` objects. – Martijn Pieters Oct 01 '14 at 09:02

score 12 · Answer 9 · edited Jul 15 '18 at 08:17

string.punctuation misses loads of punctuation marks that are commonly used in the real world. How about a solution that works for non-ASCII punctuation?

import regex
s = u"string. With. Some・Really Weird、Non？ASCII。 「（Punctuation）」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

Personally, I believe this is the best way to remove punctuation from a string in Python because:

It removes all Unicode punctuation
It's easily modifiable, e.g. you can remove the \{S} if you want to remove punctuation, but keep symbols like $.
You can get really specific about what you want to keep and what you want to remove, for example \{Pd} will only remove dashes.
This regex also normalizes whitespace. It maps tabs, carriage returns, and other oddities to nice, single spaces.

This uses Unicode character properties, which you can read more about on Wikipedia.

This line actually does not work: `remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)` — John Stud, Jun 29 '20 at 00:09

score 9 · Answer 10 · edited Jul 15 '18 at 08:13

9

Here's a one-liner for Python 3.5:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

edited Jul 15 '18 at 08:13

Peter Mortensen

28,342
21
95
123

answered Mar 21 '16 at 02:46

Tim P

367
2
9

score 9 · Answer 11 · edited Jul 15 '18 at 08:15

9

I haven't seen this answer yet. Just use a regex; it removes all characters besides word characters (\w) and number characters (\d), followed by a whitespace character (\s):

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

edited Jul 15 '18 at 08:15

Peter Mortensen

28,342
21
95
123

answered Jun 18 '16 at 06:38

Blairg23

8,642
5
61
61

2

`\d` is redundant since it is a subset of `\w`. – blhsing Jan 10 '19 at 18:37
Number characters are considered a subset of Word characters? I thought a Word character was any character that could construct a real word, e.g. a-zA-Z? – Blairg23 Jan 10 '19 at 20:55
Yes, a "word" in regex includes alphabets, numbers and underscore. Please see the description for `\w` in the documentation: https://docs.python.org/3/library/re.html – blhsing Jan 10 '19 at 21:10

score 8 · Answer 12 · edited Sep 02 '13 at 10:28

8

This might not be the best solution however this is how I did it.

import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

edited Sep 02 '13 at 10:28

Ashwini Chaudhary

217,951
48
415
461

answered Jul 05 '11 at 04:30

David Vuong

81
1
1

score 6 · Answer 13 · edited Jul 15 '18 at 08:12

6

Here is a function I wrote. It's not very efficient, but it is simple and you can add or remove any punctuation that you desire:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

edited Jul 15 '18 at 08:12

Peter Mortensen

28,342
21
95
123

answered Sep 22 '15 at 14:30

Dr.Tautology

236
5
18

commas between punctuation not necessary. `punctlist` can just be a string – Nick Apr 27 '21 at 12:29

score 5 · Answer 14 · edited Mar 02 '17 at 14:58

5

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

edited Mar 02 '17 at 14:58

m81

1,896
5
27
45

answered Feb 02 '17 at 21:48

Haythem HADHAB

79
1
5

Seems like that would only work for ASCII characters. – SearchTools-Avi Oct 14 '19 at 19:15

krinker · Answer 15 · 2018-05-07T14:25:26.600

Just as an update, I rewrote the @Brian example in Python 3 and made changes to it to move regex compile step inside of the function. My thought here was to time every single step needed to make the function work. Perhaps you are using distributed computing and can't have regex object shared between your workers and need to have re.compile step at each worker. Also, I was curious to time two different implementations of maketrans for Python 3

table = str.maketrans({key: None for key in string.punctuation})

vs

table = str.maketrans('', '', string.punctuation)

Plus I added another method to use set, where I take advantage of intersection function to reduce number of iterations.

This is the complete code:

import re, string, timeit

s = "string. With. Punctuation"


def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)


def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())


def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)


def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)


def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)
    return(s.translate(table))


def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s


print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

This is my results:

sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

score 4 · Answer 16 · edited Jul 15 '18 at 08:13

4

A one-liner might be helpful in not very strict cases:

''.join([c for c in s if c.isalnum() or c.isspace()])

edited Jul 15 '18 at 08:13

Peter Mortensen

28,342
21
95
123

answered Oct 17 '15 at 23:03

Dom Grey

51
4

score 4 · Answer 17 · answered Aug 24 '16 at 05:43

4

>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)


['string', 'With', 'Punctuation']

answered Aug 24 '16 at 05:43

Pablo Rodriguez Bertorello

220
2
8

2

Please edit with more information. Code-only and "try this" answers are discouraged, because they contain no searchable content, and don't explain why someone should "try this". – Paritosh Aug 24 '16 at 07:29

ngub05 · Answer 18 · 2017-04-02T20:03:45.503

Here's a solution without regex.

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then

Replaces the punctuations with spaces
Replace multiple spaces in between words with a single space
Remove the trailing spaces, if any with strip()

score 3 · Answer 19 · answered Jul 29 '19 at 08:18

3

Why none of you use this?

 ''.join(filter(str.isalnum, s))

Too slow?

answered Jul 29 '19 at 08:18

Dehua Li

283
1
2
7

1

Note that this will also remove spaces. – Georgy Jul 29 '19 at 12:36

score 2 · Answer 20 · answered Jan 02 '17 at 08:56

#FIRST METHOD
#Storing all punctuations in a variable    
punctuation='!?,.:;"\')(_-'
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
                  newstring+=i
print "The string without punctuation is",newstring

#SECOND METHOD
word=raw_input("Enter string: ")
punctuation='!?,.:;"\')(_-'
newstring=word.translate(None,punctuation)
print "The string without punctuation is",newstring


#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage

score 2 · Answer 21 · edited Apr 02 '17 at 20:10

2

with open('one.txt','r')as myFile:

    str1=myFile.read()

    print(str1)


    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList=[]
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print(i,end='\n')
    print ("____________")

edited Apr 02 '17 at 20:10

twasbrillig

12,313
7
37
61

answered Jan 04 '17 at 11:09

Isayas Wakgari Kelbessa

21
1

Zain Sarwar · Answer 22 · 2020-08-20T15:07:01.450

2

Here's one other easy way to do it using RegEx

import re

punct = re.compile(r'(\w+)')

sentence = 'This ! is : a # sample $ sentence.' # Text with punctuation
tokenized = [m.group() for m in punct.finditer(sentence)]
sentence = ' '.join(tokenized)
print(sentence) 
'This is a sample sentence'

edited Aug 20 '20 at 15:07

answered Aug 20 '20 at 08:05

Zain Sarwar

521
5
3

score 2 · Answer 23 · answered Sep 02 '20 at 07:51

2

Try that one :)

regex.sub(r'\p{P}','', s)

answered Sep 02 '20 at 07:51

Vivian

105
6

score 1 · Answer 24 · answered Mar 26 '21 at 14:09

1

I was looking for a really simple solution. here's what I got:

import re 

s = "string. With. Punctuation?" 
s = re.sub(r'[\W\s]', ' ', s)

print(s)
'string  With  Punctuation '

answered Mar 26 '21 at 14:09

aloha

3,504
5
26
35

Rajan saha Raju · Answer 25 · 2020-06-04T05:14:11.220

0

Considering unicode. Code checked in python3.

from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

edited Jun 04 '20 at 05:14

answered Jun 04 '20 at 05:08

Rajan saha Raju

537
4
12

score 0 · Answer 26 · answered Apr 27 '21 at 11:48

0

You can also do this:

import string
' '.join(word.strip(string.punctuation) for word in 'text'.split())

answered Apr 27 '21 at 11:48

mohannatd

11
1

score -1 · Answer 27 · edited Jan 05 '17 at 08:47

Remove stop words from the text file using Python

print('====THIS IS HOW TO REMOVE STOP WORS====')

with open('one.txt','r')as myFile:

    str1=myFile.read()

    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"

    myList=[]

    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")

            print(i,end='\n')

score -2 · Answer 28 · answered Apr 06 '13 at 17:28

-2

I like to use a function like this:

def scrub(abc):
    while abc[-1] is in list(string.punctuation):
        abc=abc[:-1]
    while abc[0] is in list(string.punctuation):
        abc=abc[1:]
    return abc

answered Apr 06 '13 at 17:28

Disk Giant

1

1

This is stripping characters from the start and end; use `abc.strip(string.punctuation)` instead for that. It won't remove such characters *in the middle*. – Martijn Pieters Feb 08 '16 at 21:13

Best way to strip punctuation from a string

28 Answers28

Linked

Related