0

I have a fairly complex text file file1.txt that hasn't been munged properly. The file is tab-delimited however, i.e. each string is separated by \t.

I would like to write a script/use a Unix command that parses this entire file for a certain string string1: which will print the line after the colon until stopping at \t.

The text file looks like this:

...kjdafhldkhlfak\tSTRING1:Iwanttokeepthis\tfadfasdafldafh\tSTRING1:andthis\tafsdkfasldh....

So the grep like function outputs

Iwanttokeepthis
andthis

In Perl, I know how to print a string if it occurs with

perl -wln -e 'print if /\bSTRING1\b/' file1.txt

How would one revise this to print the line between STRING1: and \t?

ShanZhengYang
  • 12,508
  • 35
  • 106
  • 190

2 Answers2

5

With Perl:

$ echo $'kjdafhldkhlfak\tSTRING1:Iwanttokeepthis\tfadfasdafldafh\tSTRING1:andthis\tafsdkfasldh' > /tmp/file
perl -lne 'while (/STRING1:([^\t]+)\t/g) {print $1}' /tmp/file
Iwanttokeepthis
andthis

Or, as stated in comments:

$ perl -nle'print for /STRING1:([^\t]*)\t/g' /tmp/file
Iwanttokeepthis
andthis
dawg
  • 80,841
  • 17
  • 117
  • 187
  • 2
    Simpler: `perl -nle'print for /STRING1:([^\t]*)\t/g'`, or just `perl -nle'print for /STRING1:([^\t]*)/g'` – ikegami Jan 09 '17 at 03:26
1

With GNU grep:

grep -Po 'STRING1:\K.*?(?=\t)' file

Output:

Iwanttokeepthis
andthis

See: The Stack Overflow Regular Expressions FAQ

Community
  • 1
  • 1
Cyrus
  • 69,405
  • 13
  • 65
  • 117
  • `.*?` is a fragile construct. There's no problems in your particular pattern, but I like to avoid it whenever possible. You could also use `grep -Po 'STRING1:\K[^\t]*(?=\t)' file` or just `grep -Po 'STRING1:\K[^\t]*' file` – ikegami Jan 09 '17 at 12:23