1

I have the following string and I want to replace (remove) the second word that appears in the string (SECONDWORD). The following line doesn't change anything and just prints the same string. However, when I remove the '123' part of the string the line seems to work.

echo "WORD 123 SECONDWORD THIRDWORD" | sed 's/ *[A-Z]* *//2'

I don't see the problem.

pinacolada
  • 13
  • 2
  • See: [The Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/3776858) – Cyrus Jun 19 '17 at 17:48

2 Answers2

0

Try this: sed -r 's/\b[A-Z]+ *//2

The *, which matches zero occurences, is the issue.

You can see this by playing with the trailing number. When you have space separated strings that don't match [A-Z], you get matches on empty strings that delimit unmatched strings.

    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//1'
    123 SECONDWORD THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//2'
    WORD 123 SECONDWORD THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//3'
    WORD 123 SECONDWORD THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//4'
    WORD 123THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//5'
    WORD 123 SECONDWORD FOURTHWORD

... so when you have 123 in there you actually want the 4th matching item removed. Where 'WORD ', null, null,' SECONDWORD ','THIRDWORD ' are matching patterns 1-5. It's matching two empty strings around the boundaries of 123.

You can fix this by using + and not * with sed -r:

[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed -r 's/ *[A-Z]+ *//2'
WORD 123THIRDWORD FOURTHWORD

Or use the uglier \{1,\} syntax without -r:

[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]\{1,\} *//2'
WORD 123THIRDWORD FOURTHWORD

But that ate a space you didn't want eaten, so used the \b word boundary marker:

[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed -r 's/\b[A-Z]+ *//2'
WORD 123 THIRDWORD FOURTHWORD
stevesliva
  • 4,685
  • 1
  • 14
  • 36
-1

This might work for you (GNU sed):

sed 's/\S\+\s*//2' file

Remove the second occurrence of one or more non-spaces followed by zero or more spaces.

May also be written:

sed 's/\S\S*\s*//2' file
potong
  • 47,186
  • 6
  • 43
  • 72