0

i have this this file ( pattern1 and pattern2 is fixed but numbers is randoms )

aaaa patern1[1234] bbbb cccc pattern2[5678]



jjjj patern1[9999] hhhhhhhh

and I want to extract the following patterns with bash script

pattern1[1234] pattern2[5678]

pattern1[9999]

I try by grep -Eo 'pattern1\[[0-9]{1,4}' it works for one pattern not for two,

user000001
  • 28,881
  • 12
  • 68
  • 93
dt128
  • 81
  • 9
  • Please take care with your formatting. Have you made any attempt to solve this problem yourself? If so, [edit] your question to show us. – Tom Fenech Sep 12 '16 at 09:06
  • Well, what did you try? – Inian Sep 12 '16 at 09:06
  • `man grep` is a good start. `grep -o RE` is all you need. – James Brown Sep 12 '16 at 09:08
  • can you clarify these points: 1) Is `patern1` a typo in your example input (`t` vs `tt` in `pattern2`) 2) Do you want output in separate lines or retain matched text in their own line as shown in your expected output? – Sundeep Sep 12 '16 at 09:59
  • @ap asic if Patterns belong to separate line, muse be show in separate line, and if blelone to same line muse be show in same line – dt128 Sep 12 '16 at 11:03
  • @dt128 thanks, that clarifies second point.. could you clarify if your input file contains `patern1` or `pattern1`? – Sundeep Sep 12 '16 at 11:51
  • overview: https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean alternatives ('|'): https://stackoverflow.com/questions/22187880/what-does-the-means-in-this-regex/22187948#22187948 – Micha Wiedenmann Sep 12 '16 at 13:38
  • @sp asic pattern 1 or pattern 2(?) Simply must be shown – dt128 Sep 12 '16 at 14:30

3 Answers3

2
$ cat ip.txt 
aaaa pattern1[1234] bbbb cccc pattern2[5678]
jjjj pattern1[9999] hhhhhhhh

$ perl -lne 'print join " ", /pattern[12]\[\d+\]/g' ip.txt
pattern1[1234] pattern2[5678]
pattern1[9999]
  • pattern[12]\[\d+\] pattern to extract
  • print join " ", to print the results separated by space

If lines not containing the desired pattern are to be omitted:

perl -lne 'print join " ", //g if /pattern[12]\[\d+\]/' ip.txt
Sundeep
  • 19,273
  • 2
  • 19
  • 42
1

You can use the pipe character | to allow for multiple patterns:

grep -oP '(patern1|pattern2)\[[0-9]{1,4}\]' file
patern1[1234]
pattern2[5678]
patern1[9999]

Since the patterns are similar, you can simplify like this:

grep -oP 'patt?ern[12]\[[0-9]{1,4}\]' file
user000001
  • 28,881
  • 12
  • 68
  • 93
  • out put of this script is not as expected, pattern1[1234] and pattern2 [5678] must be in one line, – dt128 Sep 12 '16 at 10:57
1
$ awk '{ c=0; while ( match($0,/(patern1|pattern2)[[][^][]+[]]/) ) { printf "%s%s", (c++?OFS:""), substr($0,RSTART,RLENGTH); $0=substr($0,RSTART+RLENGTH) } if (c) print "" }' file
patern1[1234] pattern2[5678]
patern1[9999]

If you prefer brevity over clarity then consider this, using GNU awk for multi-char RS and RT and run against the same input file as shown in https://stackoverflow.com/a/39453928/1745001:

$ awk -v RS='pattern[12][[][0-9]+[]]|\n' '{$0=RT;ORS=(/\n/?x:FS)} 1' file
pattern1[1234] pattern2[5678]
pattern1[9999]
Community
  • 1
  • 1
Ed Morton
  • 157,421
  • 15
  • 62
  • 152
  • 1
    +1 Just tested the command and it seems to work with the sample data. It's amazing though that such a complex awk command is needed for this relatively simple task – user000001 Sep 13 '16 at 19:46
  • 1
    Don't confuse lengthy with complex. The awk command is long but it's a trivial loop printing each string that matches a regexp. I could almost certainly come up with one that's briefer but IS complex if you prefer. – Ed Morton Sep 13 '16 at 19:48
  • I added a briefer script to the answer for you. I personally wouldn't use it as I find it hard to understand but if you're planning to use a perl solution... – Ed Morton Sep 13 '16 at 20:02
  • 1
    That's a nice one too. I don't understand though what the `x` is doing in `/\n/?x:FS`. EDIT: got it, it's just an empty variable shorter than `""` – user000001 Sep 13 '16 at 20:09
  • 1
    Here's the differences between perl and awk for manipulating text best I can tell: You can write clear code in either tool, and you can write brief code in either tool. For some reason people using perl tend to favor brevity over clarity (to the point where that is idiomatic) while people using awk favor clarity over brevity. Perl has created constructs to do the most obscure things as briefly as possible while Awk does not introduce constructs unless it's to do a common task that is difficult to do otherwise. Both create perls reputation for obfuscated code (http://www.zoitz.com/archives/13). – Ed Morton Sep 13 '16 at 20:34