Combining awk and sed to match line and replace characters

Question

I'm trying to match first letter, in this example "B", and second column "2". When match is found replace characters [38-41] with white spaces.

Here is the data I'm trying to modify:

A1234A123 1 2 12345.12345 1234.1234.112341234

B1234A123 2 2 12345.12345 1234.1234.112341234

A1234A123 2 2 12345.12345 1234.1234.112341234

I can match the conditions with awk using:

awk '/^B/ && $2=="2" {print}'

and I can modify the lines with sed using:

sed -r 's/^(.{37})(.{4})/\1    /'

I'm trying to find the lines in the file which contains the two conditions and then modify the characters, while still printing the entire line of lines that don't match. Can you combine the two commands in order to introduce some sort of if/then statement?

I've tried to combine the commands, but it edited all of the lines:

awk '/^B/ && $2=="2" {print}' ¦ sed -r 's/^(.{37})(.{4})/\1    /' data

Resulting data should look like this:

A1234A123 1 2 12345.12345 1234.1234.112341234

B1234A123 2 2 12345.12345 1234.1234.1    1234

A1234A123 2 2 12345.12345 1234.1234.112341234

Thanks in advance.

You never need to combine sed and awk (or grep and awk). sed is an excellent tool for simple substitutions on a single line, for any other text manipulation just use awk. — Ed Morton, Dec 29 '13 at 12:37
OK @Ed, thanks for the advice and corrections to others posts. I was thinking the solution was more difficult than it ended up being. The more I read on AWK, the more I realize it's potential. I'll keep studying! Thanks again. — fryman84, Dec 29 '13 at 18:46
http://stackoverflow.com/questions/1632113/what-is-the-difference-between-sed-and-awk — fryman84, Dec 29 '13 at 19:09
The discussion on that link is all fine but THE important thing to know about the 2 tools is that sed was invented before awk. Once awk was invented in the mid-1970s most of seds language constructs became obsolete so today the only useful sed constructs are s, g, and p (with the -n option) and any time you're using hold space or pattern space or whatever other "space" sed supports, you are using the wrong tool. sed is an excellent tool for simple substitutions on a single line - that's it. — Ed Morton, Dec 29 '13 at 19:17

score 3 · Accepted Answer · answered Dec 29 '13 at 07:20

3

You can use single awk to combine both commands:

awk '/^B/ && $2=="2"{$0=substr($0, 1, 37) "    " substr($0, 38, 4)} 1' file
A1234A123 1 2 12345.12345 1234.1234.112341234
B1234A123 2 2 12345.12345 1234.1234.1    1234
A1234A123 2 2 12345.12345 1234.1234.112341234

answered Dec 29 '13 at 07:20

anubhava

664,788
59
469
547

Got it! Thanks! I actually just changed the character location of the last substr so that it would retain the last four characters. `awk '/^B/ && $2=="2"{$0=substr($0, 1, 37) " " substr($0, 42, 4)} 1' file` – fryman84 Dec 29 '13 at 07:36
You're welcome, yes `substr($0, 42, 4)` will also return `1234` in output. – anubhava Dec 29 '13 at 07:48
You don't need the `, 4` arg in the last substr(). – Ed Morton Dec 29 '13 at 12:29
Yes if picking right most part of a string then 2nd argument isn't needed in substr – anubhava Dec 29 '13 at 12:34

Tomas · Answer 2 · 2013-12-29T13:32:09.500

2

You may instruct sed to replace only the matching line (/^B[^ ]* 2/) by prepending the regex:

sed -r '/^B[^\s]*\s2\s/s/^(.{37}).{4}/\1    /' data

edited Dec 29 '13 at 13:32

answered Dec 29 '13 at 07:16

Tomas

52,167
46
207
345

Whatchout `2` could match `20`. No need to group the bit you are going to throw away. – potong Dec 29 '13 at 10:47
@potong you are absolutely correct. I just copied the OP's regex. – Tomas Dec 29 '13 at 10:48
Always dangerous to copy the code of the one person in the thread who you KNOW doesn't know how to solve the problem :-). – Ed Morton Dec 29 '13 at 13:29
@EdMorton not really dangerous in this case ;-) – Tomas Dec 29 '13 at 13:30

score 1 · Answer 3 · answered Dec 29 '13 at 12:31

1

With GNU awk:

gawk '/^B/ && $2=="2" {print gensub(/(.{37}).{4}/,"\\1    ","")}' data

answered Dec 29 '13 at 12:31

Ed Morton

157,421
15
62
152

score 0 · Answer 4 · answered Dec 29 '13 at 07:38

In Gnu Awk version 4 you could try:

gawk 'BEGIN { FIELDWIDTHS = "1 9 1 26 4 20"; OFS="" }
$1=="B" && $3=="2" {
    $5="    "
} 1' file

with output:

A1234A123 1 2 12345.12345 1234.1234.112341234
B1234A123 2 2 12345.12345 1234.1234.1    1234
A1234A123 2 2 12345.12345 1234.1234.112341234

Combining awk and sed to match line and replace characters

4 Answers4