2

I am new here and just start using regular expressions in my python codes. I have a string which has 6 commas inside. One of the commas is fallen between two quotation marks. I want to get rid of the quotation marks and the last comma.

The input:

string = 'Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'

I want this output:

string = 'Fruits,Pear,Cherry,Apple,Orange,Cherry'         

The output of my code:

string = 'Fruits,Pear,**CherryApple**,Orange,Cherry'

here is my code in python:

if (re.search('"', string)):
    matches  = re.findall(r'\"(.+?)\"',string);
    matches1 = re.sub(",", "", matches[0]);
    string   = re.sub(matches[0],matches1,string);
    string   = re.sub('"','',string);

My problem is, I want to give a condition that the code only works for the last bit ("Cherry,") but unfortunately it affects other words in the middle (Cherry,Apple), which has the same text as the one between the quotation marks! That results in reducing the number of commas (from 6 to 4) as it merges two fields (Cherry,Apple) and I want to be left with 5 commas.

fullString = '2000-04-24 12:32:00.000,22186CBD0FDEAB049C60513341BA721B,0DDEB5,COMP,Ch‌​erry Corp.,DE,100,0.57,100,31213C678CC483768E1282A9D8CB524C,365.0‌​0000,business,acquis‌​itions-mergers,acqui‌​sition-bid,interest,‌​acquiree,fact,,,,,,,‌​,,,,,,acquisition-in‌​terest-acquiree,Cher‌​ry Corp. Gets Buyout Offer From Chairman President,FULL-ARTICLE,B5569E,Dow Jones Newswires,0.04,-0.18,0,0,1,0,0,0,0,1,1,5,RPA,DJ,DN2000042400‌​0597,"Cherry Corp. Gets Buyout Offer From Chairman President,"\n'

Many Thanks in advance

iBug
  • 30,581
  • 7
  • 64
  • 105
Zryan
  • 103
  • 1
  • 6

4 Answers4

2

For your task you don't need regular expressions, just use replace:

string = 'Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'
new_string = string.replace('"').strip(',')
Daniel
  • 39,063
  • 4
  • 50
  • 76
  • 1
    Doesn't `replace()` need two arguments? Also, `strip()` might not even be necessary (eg.`'.replace(',"',',')[:-1]`) — depends on the string perhaps... – l'L'l Apr 12 '17 at 16:48
  • @Daniel: What if "Cherry," is not the last element of the string ? – Zryan Apr 12 '17 at 17:23
  • 1
    @Zryan "Cherry" in this case is irrellevant. `strip(',')` will remove any `,` characters from the beginning or end of the string, so something like: `",,,this,is,a,test,,,"` would become `"this,is,a,test"` – Aaron Apr 12 '17 at 17:37
1

The best way would be to use the newer regex module where (*SKIP)(*FAIL) is supported:

import regex as re

string = 'Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'

# parts
rx = re.compile(r'"[^"]+"(*SKIP)(*FAIL)|,')

def cleanse(match):
    rxi = re.compile(r'[",]+')
    return rxi.sub('', match)

parts = [cleanse(match) for match in rx.split(string)]
print(parts)
# ['Fruits', 'Pear', 'Cherry', 'Apple', 'Orange', 'Cherry']

Here you match anything between double quotes and throw it away afterwards, thus only commas outside quotes are used for the split operation. The rest is a list comprehension with a cleaning function.
See a demo on regex101.com.

Jan
  • 38,539
  • 8
  • 41
  • 69
0

Why not simply use this:

>>>ans_string=string.replace('"','')[0:-1]

Output

>>>ans_string
'Fruits,Pear,Cherry,Apple,Orange,Cherry'

For the sake of simplicity and algorithmic complexity.

Community
  • 1
  • 1
ABcDexter
  • 1,623
  • 3
  • 25
  • 36
0

You might consider using the csv module to do this.

Example:

import csv 
s='Fruits,Pear,Cherry,Apple,Orange,"Cherry,"'
>>> ','.join([e.replace(',','') for row in csv.reader([s]) for e in row])
Fruits,Pear,Cherry,Apple,Orange,Cherry

The csv module will strip the quotes but keep the commas on each quoted field. Then you can just remove that comma that was kept.

This will take care of any modifications desired (remove , for example) on a field by field basis. The fields with quotes and commas could be any field in the string.


If your content is in a csv file, you would do something like this (in pseudo code)

with open(file, 'rb') as csv_fo:
   # modify(string) stands for what you want to do to each field...
   for row in csv.reader(csv_fo):
      new_row=[modify(field) for field in row]
      # now do what you need with that row
dawg
  • 80,841
  • 17
  • 117
  • 187
  • Actually, I have to read a csv file and one of the lines looks like my simple question, I can send you the full line if you don't mind? – Zryan Apr 12 '17 at 17:30
  • If you are actually processing a file vs just one string -- definitely use the csv module. Just post full line examples of several lines of the file and I will modify my answer. – dawg Apr 12 '17 at 17:34
  • Thanks for your help, but your code is decreasing the number of fields, which leads to a bug. – Zryan Apr 12 '17 at 18:03
  • fullString = '2000-04-24 12:32:00.000,22186CBD0FDEAB049C60513341BA721B,0DDEB5,COMP,Cherry Corp.,DE,100,0.57,100,31213C678CC483768E1282A9D8CB524C,365.00000,business,acquisitions-mergers,acquisition-bid,interest,acquiree,fact,,,,,,,,,,,,,acquisition-interest-acquiree,Cherry Corp. Gets Buyout Offer From Chairman President,FULL-ARTICLE,B5569E,Dow Jones Newswires,0.04,-0.18,0,0,1,0,0,0,0,1,1,5,RPA,DJ,DN20000424000597,Cherry Corp. Gets Buyout Offer From Chairman President,\n' – Zryan Apr 12 '17 at 18:10
  • 1
    Please put that in your question and include the desired output. – dawg Apr 12 '17 at 18:45