how to remove the first two columns in a file using shell (awk, sed, whatever)

Question

I have a file with many lines in each line there are many columns(fields) separated by blank " " the numbers of columns in each line are different I want to remove the first two columns how to?

Possible duplicate of [Using awk to print all columns from the nth to the last](https://stackoverflow.com/questions/2961635/using-awk-to-print-all-columns-from-the-nth-to-the-last) — Ciro Santilli新疆棉花TRUMP BAN BAD, Jan 12 '19 at 20:24

score 152 · Answer 1 · edited Nov 18 '14 at 06:10

152

You can do it with cut:

cut -d " " -f 3- input_filename > output_filename

Explanation:

cut: invoke the cut command
-d " ": use a single space as the delimiter (cut uses TAB by default)
-f: specify fields to keep
3-: all the fields starting with field 3
input_filename: use this file as the input
> output_filename: write the output to this file.

Alternatively, you can do it with awk:

awk '{$1=""; $2=""; sub("  ", " "); print}' input_filename > output_filename

Explanation:

awk: invoke the awk command
$1=""; $2="";: set field 1 and 2 to the empty string
sub(...);: clean up the output fields because fields 1 & 2 will still be delimited by " "
print: print the modified line
input_filename > output_filename: same as above.

edited Nov 18 '14 at 06:10

Steven Penny

82,115
47
308
348

answered Nov 19 '12 at 00:48

sampson-chen

40,497
12
76
76

@wenzi oops, forgot that `cut` uses tab as delimiter by default. See updated answer - just tested and it works. all else being equal, I would recommend using `cut` over `awk`. – sampson-chen Nov 19 '12 at 01:02
You could do it in awk with just `awk '{sub(/([^ ]+ ){2}/, "")}1'`. I agree cut is the better choice anyway if you have a single-char field separator though. – Ed Morton Nov 19 '12 at 14:00
there are still some whitespaces left, use `awk '{$1=""; $2=""; sub(/^ +/, ""); print}'` instead or shorter `awk '{$1=$2=""; sub(/^ +/, "")}1'` – jirislav Jan 09 '18 at 11:40

score 27 · Answer 2 · answered Feb 05 '13 at 19:18

Here's one way to do it with Awk that's relatively easy to understand:

awk '{print substr($0, index($0, $3))}'

This is a simple awk command with no pattern, so action inside {} is run for every input line.

The action is to simply prints the substring starting with the position of the 3rd field.

$0: the whole input line
$3: 3rd field
index(in, find): returns the position of find in string in
substr(string, start): return a substring starting at index start

If you want to use a different delimiter, such as comma, you can specify it with the -F option:

awk -F"," '{print substr($0, index($0, $3))}'

You can also operate this on a subset of the input lines by specifying a pattern before the action in {}. Only lines matching the pattern will have the action run.

awk 'pattern{print substr($0, index($0, $3))}'

Where pattern can be something such as:

/abcdef/: use regular expression, operates on $0 by default.
$1 ~ /abcdef/: operate on a specific field.
$1 == blabla: use string comparison
NR > 1: use record/line number
NF > 0: use field/column number

Thanks for this, it's a nicer answer than the accepted one IMO — Alex Forbes, Mar 25 '14 at 13:04
How about removing the last 2 column, counting from the reverse? — CMCDragonkai, Jun 25 '14 at 09:00
This will not work correctly if field #2 and field #3 has same contents. — PHP Learner, Sep 19 '15 at 10:17

score 12 · Answer 3 · answered Jul 07 '14 at 01:13

12

Thanks for posting the question. I'd also like to add the script that helped me.

awk '{ $1=""; print $0 }' file

answered Jul 07 '14 at 01:13

Felipe Alvarez

3,300
2
28
32

1

Awk doesn't keep field delimiters in this case. – timurb Dec 10 '14 at 08:54
You can add `OFS=FS` to keep the delimiters:https://unix.stackexchange.com/a/252748/112834 – MichaelChirico Oct 12 '20 at 18:12

score 6 · Answer 4 · edited May 23 '17 at 11:47

6

You can use sed:

sed 's/^[^ ][^ ]* [^ ][^ ]* //'

This looks for lines starting with one-or-more non-blanks, a blank, another set of one-or-more non-blanks and another blank, and deletes the matched material, aka the first two fields. The [^ ][^ ]* is marginally shorter than the equivalent but more explicit [^ ]\{1,\} notation, and the second might run into issues with GNU sed (though if you use --posix as an option, even GNU sed can't screw it up). OTOH, if the character class to be repeated was more complex, the numbered notation wins for brevity. It is easy to extend this to handle 'blank or tab' as separator, or 'multiple blanks' or 'multiple blanks or tabs'. It could also be modified to handle optional leading blanks (or tabs) before the first field, etc.

For awk and cut, see Sampson-Chen's answer. There are other ways to write the awk script, but they're not materially better than the answer given. Note that you might need to set the field separator explicitly (-F" ") in awk if you do not want tabs treated as separators, or you might have multiple blanks between fields. The POSIX standard cut does not support multiple separators between fields; GNU cut has the useful but non-standard -i option to allow for multiple separators between fields.

You can also do it in pure shell:

while read junk1 junk2 residue
do echo "$residue"
done < in-file > out-file

edited May 23 '17 at 11:47

Community

1
1

answered Nov 19 '12 at 01:34

Jonathan Leffler

666,971
126
813
1,185

If `residue` can contain a backslash, the above read will interpret it and not reproduce it in the output. Always use `while IFS= read -r ...`. – Ed Morton Nov 19 '12 at 03:58
If `bash` interprets the contents with a plain `read`, then `bash` is broken (again). The read command in original shells didn't do such nonsense; I don't believe it is required by POSIX shell. It would irritate the blazes out of me to find that `bash` does what you say it does — I already have a love/hate relation with the program since it does a lot of things well, but there are some things that it does badly, and changing legacy behaviour is one of the worst, and requiring an option to enable the old standard behaviour is ... very irritating. It seems you're right; `bash` is borked! – Jonathan Leffler Nov 19 '12 at 04:01
That behavior is POSIX, see http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html. – Ed Morton Nov 19 '12 at 13:48
I see I didn't say it explicitly but the reason you need IFS= is that if the first field in the input was empty, then default field splitting would strip leading blanks so `residue` would start at field 4 (or later) instead of field 3. – Ed Morton Nov 19 '12 at 14:04
Damn...OK; POSIX is borked, but `bash` is following POSIX 2008. I've never wanted that functionality in more than a quarter century of shell programming, but I guess I must be in a minority. – Jonathan Leffler Nov 19 '12 at 15:05

technosaurus · Answer 5 · 2016-02-06T20:21:59.217

6

Its pretty straight forward to do it with only shell

while read A B C; do
echo "$C"
done < oldfile >newfile

edited Feb 06 '16 at 20:21

answered Jul 07 '14 at 02:09

technosaurus

7,101
1
28
49

This is a great answer, however you will want to use `read -r` instead of `read`. – robert Dec 13 '17 at 21:10
`read -r` will preserve backslashes. `read` will not. For example: `echo "foo ba\r"` will produce an output of `foo ba\r`. However, `echo "foo ba\r" | (while read first_column second_column; do echo "$second_column"; done)` will produce just `bar` as the output (with the backslash removed. Adding the `-r` flag produces the correct output of `ba\r` – robert Dec 13 '17 at 21:11

Vijay · Answer 6 · 2014-12-10T09:24:42.373

4

perl:

perl -lane 'print join(' ',@F[2..$#F])' File

awk:

awk '{$1=$2=""}1' File

edited Dec 10 '14 at 09:24

answered Dec 10 '14 at 09:17

Vijay

59,537
86
209
308

Carlos · Answer 7 · 2017-12-20T19:18:59.863

2

Using awk, and based in some of the options below, using a for loop makes a bit more flexible; sometimes I may want to delete the first 9 columns ( if I do an "ls -lrt" for example), so I change the 2 for a 9 and that's it:

awk '{ for(i=0;i++<2;){$i=""}; print $0 }' your_file.txt

edited Dec 20 '17 at 19:18

answered Dec 20 '17 at 19:13

Carlos

333
2
7

score 1 · Answer 8 · answered Nov 19 '12 at 07:14

1

This might work for you (GNU sed):

sed -r 's/^([^ ]+ ){2}//' file

or for columns separated by one or more white spaces:

sed -r 's/^(\S+\s+){2}//' file

answered Nov 19 '12 at 07:14

potong

47,186
6
43
72

score 0 · Answer 9 · answered May 12 '17 at 08:40

0

Use kscript

kscript 'lines.split().select(-1,-2).print()' file

answered May 12 '17 at 08:40

Holger Brandl

7,903
54
52

how to remove the first two columns in a file using shell (awk, sed, whatever)

9 Answers9

Linked

Related