Get the character that precede each occurrence of given character/pattern in a string

Question

I'm trying to get the character that precede each occurrence of given character/pattern in a string using standard bash tools as grep, awk/gawk, sed ...

Step I: get the character that precede each occurrence of the character :

Example:

String 1 => :hd:fg:kl:

String 2 => :df:lkjh:

String 3 => :glki:l:s:d:

Expected results

Result 1 => dgl

Result 2 => fh

Result 3 => ilsd

I tried many times with awk but without success

Step II: Insert a given character between each character of the resulting string

Example with /

Result 1 => d/g/l

Result 2 => f/h

Result 3 => i/l/s/d

I have an awk expression for this step awk -F '' -v OFS="/" '{$1=$1;print}'

I don't know if it is possible to do Step I with awk or sed and why not do Step I and Step II in once.

Kind Regards

You might find this question helpful https://stackoverflow.com/questions/2777579/how-to-output-only-captured-groups-with-sed — Leonard, Jul 04 '18 at 01:01
@moocan, please try to select any of the answer as correct answer and close the question completely see this link for more details too https://stackoverflow.com/help/someone-answers — RavinderSingh13, Jul 04 '18 at 02:19
You should include a case of back-to-back colons in your sample input/output (e.g. foo::bar) if it can occur as that could be hard to handle depending on your requirements for doing so. Is the output `o` or `o:` or something else? If it cannot happen then add a statement to your question saying so. — Ed Morton, Jul 04 '18 at 11:34
@RavinderSingh13 I apologize for my late answer because I was offline — moocan, Jul 07 '18 at 01:12
@Ed Morton, that can not happen in my case ... but it's a very good advice — moocan, Jul 07 '18 at 01:14

Lacobus · Accepted Answer · 2018-07-04T02:10:34.503

1

What about:

awk 'BEGIN{FS=":"}{for(i=1;i<NF;i++){if(i>2)printf"/";printf substr($i,length($i))}print""}' input.txt

input.txt:

:hd:fg:kl:
:df:lkjh:
:glki:l:s:d:

Output:

d/g/l
f/h
i/l/s/d

edited Jul 04 '18 at 02:10

answered Jul 04 '18 at 02:06

Lacobus

1,221
9
16

RavinderSingh13 · Answer 2 · 2018-07-04T02:30:34.460

1

Solution 1st: Could you please try following and let me know if this helps you.

awk -F":" '
{
  for(i=1;i<=NF;i++){
    if($i){ val=(val?val:"")substr($i,length($i)) }
  }
  print val;
  val=""
}' Input_file

Output will be as follows.

dgl
fh
ilsd

Solution 2nd: With a / in between output strings.

awk '
BEGIN{
  OFS="/";
  FS=":"
}
{
  for(i=1;i<=NF;i++){
    if($i){
      val=(val?val OFS:"")substr($i,length($i))
    }}
  print val;
  val=""
}' Input_file

Output will be as follows.

d/g/l
f/h
i/l/s/d

Solution 3rd: With match utility of awk.

awk '
{
  while(match($0,/[a-zA-Z]:/)){
    val=(val?val:"")substr($0,RSTART,RLENGTH-1)
    $0=substr($0,RSTART+RLENGTH)
   }
  print val
  val=""
}'  Input_file

edited Jul 04 '18 at 02:30

answered Jul 04 '18 at 02:07

RavinderSingh13

101,958
9
41
77

1

In my question, I always finished my examples with ":" which is a mistake from me because it can also end with any letter. In the case of a pattern such as ":hfd:l:jh:m", the output is "dlhm" for your first solution and "d/l/h/m" for the second solution. Your third solution works well because the output is "dlh". – moocan Jul 07 '18 at 01:48
@moocan, sue thanks for letting me know, please try to keep question's samples as per your requirement only because solutions will be given as per your samples, cheers and happy learning. – RavinderSingh13 Jul 07 '18 at 02:19

score 0 · Answer 3 · answered Jul 04 '18 at 01:43

0

This might work for you (GNU sed):

sed -r 's/[^:]*([^:]):+|:+/\1/g;s/\B/\//g' file

Replace zero or more non :'s followed by a single character followed by a : or a lone : by the single character globally throughout the line. Then replace insert a / between each character.

answered Jul 04 '18 at 01:43

potong

47,186
6
43
72

in the case of a pattern such as ":hfd:l:jh:m", the output is "d/l/ h/m". In my question, I always finished my examples with ":" which is a mistake from me because it can also end with any letter – moocan Jul 07 '18 at 01:32

score 0 · Answer 4 · answered Jul 04 '18 at 03:34

0

Perl and negative lookahead:

$ perl -p -e 's/.(?!:)//g' file
dgl
fh
ilsd

answered Jul 04 '18 at 03:34

James Brown

31,411
6
31
52

Sundeep · Answer 5 · 2018-07-04T03:40:44.500

0

This is easier to do with perl

$ cat ip.txt
:hd:fg:kl:
:df:lkjh:
:glki:l:s:d:

$ perl -lne 'print join "/", /.(?=:)/g' ip.txt
d/g/l
f/h
i/l/s/d

/.(?=:)/g get all characters preceding :
- (?=:) is a lookahead construct
the resulting matches are then printed using / as delimiter string

edited Jul 04 '18 at 03:40

answered Jul 04 '18 at 03:35

Sundeep

19,273
2
19
42

works very well with all my test pattern even if the pattern is not ending with ":" but with any letter. Thanks – moocan Jul 07 '18 at 02:01

score 0 · Answer 6 · answered Jul 04 '18 at 08:56

0

With all sed with ERE

sed -E 's#[^:]*(.):#\1/#g;s/^.|.$//g' infile

answered Jul 04 '18 at 08:56

ctac_

2,295
2
5
16

score 0 · Answer 7 · answered Jul 04 '18 at 23:29

Using GNU sed:

sed -E 's/[^:]*([^:]):/\1/g; s/([^:])/\/\1/g; s/^:\///'

The first command, s/[^:]*([^:]):/\1/g matches strips out the extra characters and the colons (except the first one), so yields this:

:dgl
:fh
:ilsd

The second command s/([^:])/\/\1/g inserts a / before each character, yielding:

:/d/g/l
:/f/h
:/i/l/s/d

The last command s/^:\/// simply removes the :/ from the beginning of each line:

d/g/l
f/h
i/l/s/d

score 0 · Answer 8 · answered Jun 29 '19 at 04:25

You could iterate across each line starting at the second character with gawk. Everytime the iterator hits a colon print the previous character.

$ awk <file.txt '{for(i=2;i<=length($0);i++) { \
                    if (substr($0,i,1)==":") printf substr($0,i-1,1);} printf "\n";}'
dgl
fh
ilsd

Get the character that precede each occurrence of given character/pattern in a string

8 Answers8