Try this: sed -r 's/\b[A-Z]+ *//2
The *
, which matches zero occurences, is the issue.
You can see this by playing with the trailing number. When you have space separated strings that don't match [A-Z]
, you get matches on empty strings that delimit unmatched strings.
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//1'
123 SECONDWORD THIRDWORD FOURTHWORD
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//2'
WORD 123 SECONDWORD THIRDWORD FOURTHWORD
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//3'
WORD 123 SECONDWORD THIRDWORD FOURTHWORD
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//4'
WORD 123THIRDWORD FOURTHWORD
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]* *//5'
WORD 123 SECONDWORD FOURTHWORD
... so when you have 123 in there you actually want the 4th matching item removed. Where 'WORD ', null, null,' SECONDWORD ','THIRDWORD '
are matching patterns 1-5. It's matching two empty strings around the boundaries of 123
.
You can fix this by using +
and not *
with sed -r
:
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed -r 's/ *[A-Z]+ *//2'
WORD 123THIRDWORD FOURTHWORD
Or use the uglier \{1,\}
syntax without -r
:
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed 's/ *[A-Z]\{1,\} *//2'
WORD 123THIRDWORD FOURTHWORD
But that ate a space you didn't want eaten, so used the \b
word boundary marker:
[~/tmp] > echo 'WORD 123 SECONDWORD THIRDWORD FOURTHWORD' | sed -r 's/\b[A-Z]+ *//2'
WORD 123 THIRDWORD FOURTHWORD