grep: keep lines by number in specific column

Question

I know how to do it with awk, for example, keep lines, which contains number 3 in second column: $ awk '"$2" == 3'

But how to do the same with only grep? What about for first column?

if you really dont want to use awk and have gnu grep, you can do something like this (that is use a wilcard for the first column): `grep -P '^[^\t]*\t3\t' your_file`. (Here I assume bash and tab as delimiter, thus the `-P` option. Adopt to your situation.) — Lars Fischer, Apr 23 '16 at 15:31
@EdMorton The tab and bash would require same strange stunts as gnu grep (without `-P`) does not know `\t`. — Lars Fischer, Apr 23 '16 at 15:34
@LarsFischer thank you, it works! mybe do you know, where I can find what this commands means? — Marta Koprivnik, Apr 23 '16 at 15:40
@EdMorton it's ok, it work's for me now, mybe do you know where can I look what this ^ * [] \t and more... commands means? — Marta Koprivnik, Apr 23 '16 at 15:48

Mort · Answer 1 · 2016-04-23T16:45:17.253

3

Grep is not great for this, awk is better. But assuming your columns are separated by spaces, then you want

grep -E '^[^ ]+ +3( |$)'

Explanation: find something that has a start of line, followed by one or more non-space characters (first column), then one or more space characters (column separator), then the number 3, then either a space (because there's another column) or end of line (if there's no other column).

(Updated to fix syntax after testing.)

edited Apr 23 '16 at 16:45

answered Apr 23 '16 at 15:55

Mort

2,888
1
19
33

idk, if I just copy-paste your code it doesn't work for me, mybe I'm just too stupid :P but this works for me: grep -P '^[^\s]*\s3\s' – Marta Koprivnik Apr 23 '16 at 16:05
Edited to fix syntax after testing. – Mort Apr 23 '16 at 16:38
1

Note that `\S` should be the equivalent to `[^\s]` in perl-regex mode. You also may want `+` not `*` depending on if a column can be empty. The former is "1 or more" and the latter is "zero or more". – Mort Apr 23 '16 at 16:43
@MartaKoprivnik If an answer worked for you and you're happy with the results, please accept the answer. You might also even want to hit the vote-up button to support the author :-) – Rany Albeg Wein Apr 23 '16 at 18:30

score 2 · Accepted Answer · edited May 23 '17 at 11:59

Here is the longer explanation for my mysterious command grep -P '^[^\t]*\t3\t' your_file from the comments:

I assumed that the column delimiter is a tab. grep without -P would require some strange things to use it directly (see e.g. see here ) . The -P makes it possible to just write \t without any problems. If for example your delimiter is ; then you could replace the \t with ; and you dont need the -P option.

Having said that, lets explain the idea behind the regular expression: You said, you want to match a 3 in the second column:

^ means: at the beginning of the line
[^\t]* means: zero or more (*) occurences of something not a tab ([^\t] here the ^ means "not a")
followed by tab
followed by 3
followed by tab

Now we have effectively expressed the idea that we need a 3 as the content of the second column (\t3\t) and we are not interested in the precise content of the first column. The ^[^\t]*\t is only necessary to express the idea "what follows is in the second column".

If you want to match something in the fourth column, you could use this to "skip" the first three column and match a 4 in the fourth column: ^([^\t]*\t){3}4. (Note the parenthesis and the {3}).

As you can see many details and awk is much more elegant and easy.

You can read this up in the documentation of grep and then you will need to study something about regular expression, e.g. start here.

grep: keep lines by number in specific column

2 Answers2