-2

Let's consider the 'file.txt', click for the file contents

I want to substitute the 3rd to the last occurrence of "p".

sed -E 's/(.*)p((.*p){2})/\1@\2/' file.txt

Here, "p" is substituted by "@". I want to know how it works. Can anyone explain me ?

mbx20
  • 33
  • 4
  • look into https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean ... though `sed` doesn't exactly have greedy quantifier, for this regex, it will be similar... `()` is capture group and `\N` is backreference.. – Sundeep Jan 28 '20 at 11:58
  • short answer: `/.*p/` will match till last `p` in the line... `/(.*p)(.*p)/` here first group will match till last but one `p` (assuming at least two p in input) and the second group will then have rest of content till last `p`.. and so on... – Sundeep Jan 28 '20 at 12:01

1 Answers1

0
  • sed
  • -E - Use extended regex. Compatible with GNU and BSD sed.
  • ' - quote the argument.
  • s - substitute
    • / - separator
    • ( - start first group
    • .* - match anything.
    • ) - end first group.
    • p - match p. Effectively, first group will contain all characters from the line up until a p.
    • ( start second group.
    • ( start third group. Notice the order.
    • .*p) match anything up until a p ...
    • {2} ... two times. So effectively, this will make sure there are at least two p in the rest of the line.
    • ) close second group. So second group will contain something, a p, something and a p.
    • / separator. Next comes replacement.
    • \1 - backreference to first group. So is substituted for all character from the beginning of the line up until first p without the p.
    • @ - a @
    • \2 - backreference to second group. So is substituted for all characters after the first p without it, the second p, something between second and third p and the third p.
    • / separator
  • ' - end single quote.

The (.*p){2} means the same as .*p.*p

KamilCuk
  • 69,546
  • 5
  • 27
  • 60
  • 1
    Great downvotes! – KamilCuk Jan 28 '20 at 12:02
  • 1
    thanks for the answer but still i am confused – mbx20 Jan 28 '20 at 12:10
  • 1
    let's consider the line 4 – mbx20 Jan 28 '20 at 12:17
  • according to you the first group "(.*)" means all the characters starting from the line followed by a "p". So here the 1st group ends. Till now understood but after that I can't get it. – mbx20 Jan 28 '20 at 12:20
  • 1
    confused about the ((.*p){2}). – mbx20 Jan 28 '20 at 12:22
  • `( .. )` is used to "remember" everything inside. `(.*p){2}` is equal to `.*p.*p`. The `.` matches any character. The `*` matches zero or more occurrences of the preceding pattern. The `.*` matches any character zero or more times. You can find many, many, many, many resources online that try to explain regular expressions. I recommend crossword puzzles, they are amazing and fun. – KamilCuk Jan 28 '20 at 13:29