3

I'm essentially trying to "tidy" a lot of data in a CSV. I don't need any of the information that's in "quotes".

Tried sed 's/".*"/""/' but it removes the commas if there's more than one section together.

I would like to get from this:

1,2,"a",4,"b","c",5

To this:

1,2,,4,,,5

Is there a sed wizard who can help? :)

Cyrus
  • 69,405
  • 13
  • 65
  • 117
materangai
  • 99
  • 5
  • 1
    see https://stackoverflow.com/questions/5319840/greedy-vs-reluctant-vs-possessive-quantifiers wrt why your attempt failed – Sundeep Apr 16 '20 at 14:59
  • `sed 's/"."//g' file` – Cyrus Apr 16 '20 at 15:30
  • Please take a look at: [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers) – Cyrus Apr 17 '20 at 12:10
  • All 3 solutions work - thank you all. Just found a curveball that in some lines, there are line breaks - just need to remove those and I'm sorted :) – materangai Apr 22 '20 at 08:13

3 Answers3

3

You may use

sed 's/"[^"]*"//g' file > newfile

See online sed demo:

s='1,2,"a",4,"b","c",5'
sed 's/"[^"]*"//g' <<< "$s"
# => 1,2,,4,,,5

Details

The "[^"]*" pattern matches ", then 0 or more characters other than ", and then ". The matches are removed since RHS is empty. g flag makes it match all occurrences on each line.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
2

Could you please try following.

awk -v s1="\"" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){if($i~s1){$i=""}}} 1' Input_file

Non-one liner form of solution is:

awk -v s1="\"" '
BEGIN{
  FS=OFS=","
}
{
  for(i=1;i<=NF;i++){
    if($i~s1){
      $i=""
    }
  }
}
1
'  Input_file

Detailed explanation:

awk -v s1="\"" '         ##Starting awk program from here and mentioning variable s1 whose value is "
BEGIN{                   ##Starting BEGIN section of this code here.
  FS=OFS=","             ##Setting field separator and output field separator as comma(,) here.
}
{
  for(i=1;i<=NF;i++){    ##Starting a for loop which traverse through all fields of current line.
    if($i~s1){           ##Checking if current field has " in it if yes then do following.
      $i=""              ##Nullifying current field value here.
    }
  }
}
1                        ##Mentioning 1 will print edited/non-edited line here.
'  Input_file            ##Mentioning Input_file name here.
RavinderSingh13
  • 101,958
  • 9
  • 41
  • 77
2

With Perl:

perl -p -e 's/".*?"//g' file

? forces * to be non-greedy.

Output:

1,2,,4,,,5
Cyrus
  • 69,405
  • 13
  • 65
  • 117