How to remove n characters from a specific column using sed/awk/perl

Question

I have the following tab delimited data:

chr1    3119713 3119728 MA05911Bach1Mafk    839 +
chr1    3119716 3119731 MA05011MAFNFE2  860 +
chr1    3120036 3120051 MA01502Nfe2l2   866 +

What I want to do is to remove 7 characters from 4th column. Resulting in

chr1    3119713 3119728 Bach1Mafk   839 +
chr1    3119716 3119731 MAFNFE2 860 +
chr1    3120036 3120051 Nfe2l2  866 +

How can I do that? Note the output needs to be also TAB separated.

I'm stuck with the following code, which replaces from the first column onward, which I don't want

sed 's/^.\{7\}//' myfile.txt

score 6 · Answer 1 · answered Oct 17 '17 at 03:27

6

 awk  '{ $4 = substr($4, 8); print }'

answered Oct 17 '17 at 03:27

Michael Rourke

1,499
13
23

Note the output needs to be also TAB separated. How can I modify? – scamander Oct 17 '17 at 03:37
3

Sorry missed that requirement... `awk 'BEGIN { OFS="\t" } { $4 = substr($4, 8); print }'` – Michael Rourke Oct 17 '17 at 03:40
or even more concise: `awk 'BEGIN{OFS="\t"}{$4 = substr($4,8)}1'` – Marc Lambrichs Oct 17 '17 at 05:57
1

I believe clarity is more important than conciseness. – Michael Rourke Oct 17 '17 at 09:22

zdim · Accepted Answer · 2017-10-20T18:02:07.737

5

perl -anE'$F[3] =~ s/.{7}//; say join "\t", @F' data.txt

or

perl -anE'substr $F[3],0,7,""; say join "\t", @F' data.txt

edited Oct 20 '17 at 18:02

answered Oct 17 '17 at 03:33

zdim

53,586
4
45
72

Note the output needs to be also TAB separated. How can I modify? – scamander Oct 17 '17 at 03:37
@yaffle Corrected. Also removed the unneeded `^` anchor; matching starts there – zdim Oct 17 '17 at 04:06
alternatively: `perl -anE'$" = "\t"; $F[3] =~ s/^.{7}//; say "@F"' data.txt` – sborsky Oct 17 '17 at 04:11

score 0 · Answer 3 · answered Oct 17 '17 at 04:45

With sed

$ sed -E 's/^(([^\t]+\t){3}).{7}/\1/' myfile.txt
chr1    3119713 3119728 Bach1Mafk   839 +
chr1    3119716 3119731 MAFNFE2 860 +
chr1    3120036 3120051 Nfe2l2  866 +

-E use extended regular expressions, to avoid having to use \ for (){}. Some sed versions might need -r instead of -E
^(([^\t]+\t){3}) capture the first three columns, easy to change number of columns if needed
.{7} characters to delete from 4th column
\1 the captured columns
Use -i option for in-place editing

With perl you can use \K for variable length positive lookbehind

perl -pe 's/^([^\t]+\t){3}\K.{7}//' myfile.txt

How to remove n characters from a specific column using sed/awk/perl

3 Answers3