The sed command in shell won't replace the second matching of a pattern

Question

I have the following string and I want to replace (remove) the second word that appears in the string (SECONDWORD). The following line doesn't change anything and just prints the same string. However, when I remove the '123' part of the string the line seems to work.

echo "WORD 123 SECONDWORD THIRDWORD" | sed 's/ *[A-Z]* *//2'

I don't see the problem.

See: [The Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/3776858) — Cyrus, Jun 19 '17 at 17:48

stevesliva · Accepted Answer · 2017-06-21T14:11:54.083

Try this: sed -r 's/\b[A-Z]+ *//2

The *, which matches zero occurences, is the issue.

You can see this by playing with the trailing number. When you have space separated strings that don't match [A-Z], you get matches on empty strings that delimit unmatched strings.

    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//1'
    123 SECONDWORD THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//2'
    WORD 123 SECONDWORD THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//3'
    WORD 123 SECONDWORD THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//4'
    WORD 123THIRDWORD FOURTHWORD
    [~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//5'
    WORD 123 SECONDWORD FOURTHWORD

... so when you have 123 in there you actually want the 4th matching item removed. Where 'WORD ', null, null,' SECONDWORD ','THIRDWORD ' are matching patterns 1-5. It's matching two empty strings around the boundaries of 123.

You can fix this by using + and not * with sed -r:

[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed -r 's/ *[A-Z]+ *//2'
WORD 123THIRDWORD FOURTHWORD

Or use the uglier \{1,\} syntax without -r:

[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]\{1,\} *//2'
WORD 123THIRDWORD FOURTHWORD

But that ate a space you didn't want eaten, so used the \b word boundary marker:

[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed -r 's/\b[A-Z]+ *//2'
WORD 123 THIRDWORD FOURTHWORD

score -1 · Answer 2 · answered Jun 20 '17 at 07:46

-1

This might work for you (GNU sed):

sed 's/\S\+\s*//2' file

Remove the second occurrence of one or more non-spaces followed by zero or more spaces.

May also be written:

sed 's/\S\S*\s*//2' file

answered Jun 20 '17 at 07:46

potong

47,186
6
43
72

The sed command in shell won't replace the second matching of a pattern

2 Answers2