Regular Expression in Python 3

Question

I am new here and just start using regular expressions in my python codes. I have a string which has 6 commas inside. One of the commas is fallen between two quotation marks. I want to get rid of the quotation marks and the last comma.

The input:

string = 'Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'

I want this output:

string = 'Fruits,Pear,Cherry,Apple,Orange,Cherry'

The output of my code:

string = 'Fruits,Pear,**CherryApple**,Orange,Cherry'

here is my code in python:

if (re.search('"', string)):
    matches  = re.findall(r'\"(.+?)\"',string);
    matches1 = re.sub(",", "", matches[0]);
    string   = re.sub(matches[0],matches1,string);
    string   = re.sub('"','',string);

My problem is, I want to give a condition that the code only works for the last bit ("Cherry,") but unfortunately it affects other words in the middle (Cherry,Apple), which has the same text as the one between the quotation marks! That results in reducing the number of commas (from 6 to 4) as it merges two fields (Cherry,Apple) and I want to be left with 5 commas.

fullString = '2000-04-24 12:32:00.000,22186CBD0FDEAB049C60513341BA721B,0DDEB5,COMP,Ch‌erry Corp.,DE,100,0.57,100,31213C678CC483768E1282A9D8CB524C,365.0‌0000,business,acquis‌itions-mergers,acqui‌sition-bid,interest,‌acquiree,fact,,,,,,,‌,,,,,,acquisition-in‌terest-acquiree,Cher‌ry Corp. Gets Buyout Offer From Chairman President,FULL-ARTICLE,B5569E,Dow Jones Newswires,0.04,-0.18,0,0,1,0,0,0,0,1,1,5,RPA,DJ,DN2000042400‌0597,"Cherry Corp. Gets Buyout Offer From Chairman President,"\n'

Many Thanks in advance

You still need to put your desired output after processing `fullString` — dawg, Apr 14 '17 at 14:08

score 2 · Answer 1 · answered Apr 12 '17 at 16:38

2

For your task you don't need regular expressions, just use replace:

string = 'Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'
new_string = string.replace('"').strip(',')

answered Apr 12 '17 at 16:38

Daniel

39,063
4
50
76

1

Doesn't `replace()` need two arguments? Also, `strip()` might not even be necessary (eg.`'.replace(',"',',')[:-1]`) — depends on the string perhaps... – l'L'l Apr 12 '17 at 16:48
@Daniel: What if "Cherry," is not the last element of the string ? – Zryan Apr 12 '17 at 17:23
1

@Zryan "Cherry" in this case is irrellevant. `strip(',')` will remove any `,` characters from the beginning or end of the string, so something like: `",,,this,is,a,test,,,"` would become `"this,is,a,test"` – Aaron Apr 12 '17 at 17:37

Jan · Accepted Answer · 2017-04-12T16:48:33.567

1

The best way would be to use the newer regex module where (*SKIP)(*FAIL) is supported:

import regex as re

string = 'Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'

# parts
rx = re.compile(r'"[^"]+"(*SKIP)(*FAIL)|,')

def cleanse(match):
    rxi = re.compile(r'[",]+')
    return rxi.sub('', match)

parts = [cleanse(match) for match in rx.split(string)]
print(parts)
# ['Fruits', 'Pear', 'Cherry', 'Apple', 'Orange', 'Cherry']

Here you match anything between double quotes and throw it away afterwards, thus only commas outside quotes are used for the split operation. The rest is a list comprehension with a cleaning function.
See a demo on regex101.com.

edited Apr 12 '17 at 16:48

answered Apr 12 '17 at 16:38

Jan

38,539
8
41
69

Thanks Jan, but I got this error message: ImportError: No module named 'regex' – Zryan Apr 12 '17 at 16:55
@Zryan: You need to install it before: `pip install regex` on the console. – Jan Apr 12 '17 at 16:56
I don't have permession to install regex on my uni pc :( is there another way to use only re package? – Zryan Apr 12 '17 at 17:20
@Zryan try `pip install regex --user` to install it under your user profile directory which requires no special priveledges – Brian M. Sheldon Oct 03 '18 at 13:24

score 0 · Answer 3 · edited May 23 '17 at 11:47

0

Why not simply use this:

>>>ans_string=string.replace('"','')[0:-1]

Output

>>>ans_string
'Fruits,Pear,Cherry,Apple,Orange,Cherry'

For the sake of simplicity and algorithmic complexity.

edited May 23 '17 at 11:47

Community

1
1

answered Apr 12 '17 at 16:48

ABcDexter

1,623
3
25
36

What if `"Cherry,"` is not the last element of the string ? – Apr 12 '17 at 16:54
@ABcDexter: yes exatly, what if it's not at the end? any other way? – Zryan Apr 12 '17 at 17:22

dawg · Answer 4 · 2017-04-12T17:46:52.237

0

You might consider using the csv module to do this.

Example:

import csv 
s='Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'
>>> ','.join([e.replace(',','') for row in csv.reader([s]) for e in row])
Fruits,Pear,Cherry,Apple,Orange,Cherry

The csv module will strip the quotes but keep the commas on each quoted field. Then you can just remove that comma that was kept.

This will take care of any modifications desired (remove , for example) on a field by field basis. The fields with quotes and commas could be any field in the string.

If your content is in a csv file, you would do something like this (in pseudo code)

with open(file, 'rb') as csv_fo:
   # modify(string) stands for what you want to do to each field...
   for row in csv.reader(csv_fo):
      new_row=[modify(field) for field in row]
      # now do what you need with that row

edited Apr 12 '17 at 17:46

answered Apr 12 '17 at 17:15

dawg

80,841
17
117
187

Actually, I have to read a csv file and one of the lines looks like my simple question, I can send you the full line if you don't mind? – Zryan Apr 12 '17 at 17:30
If you are actually processing a file vs just one string -- definitely use the csv module. Just post full line examples of several lines of the file and I will modify my answer. – dawg Apr 12 '17 at 17:34
Thanks for your help, but your code is decreasing the number of fields, which leads to a bug. – Zryan Apr 12 '17 at 18:03
fullString = '2000-04-24 12:32:00.000,22186CBD0FDEAB049C60513341BA721B,0DDEB5,COMP,Cherry Corp.,DE,100,0.57,100,31213C678CC483768E1282A9D8CB524C,365.00000,business,acquisitions-mergers,acquisition-bid,interest,acquiree,fact,,,,,,,,,,,,,acquisition-interest-acquiree,Cherry Corp. Gets Buyout Offer From Chairman President,FULL-ARTICLE,B5569E,Dow Jones Newswires,0.04,-0.18,0,0,1,0,0,0,0,1,1,5,RPA,DJ,DN20000424000597,Cherry Corp. Gets Buyout Offer From Chairman President,\n' – Zryan Apr 12 '17 at 18:10
1

Please put that in your question and include the desired output. – dawg Apr 12 '17 at 18:45

Regular Expression in Python 3

4 Answers4