1

I'm trying to get the character that precede each occurrence of given character/pattern in a string using standard bash tools as grep, awk/gawk, sed ...

Step I: get the character that precede each occurrence of the character :

Example:

String 1 => :hd:fg:kl:

String 2 => :df:lkjh:

String 3 => :glki:l:s:d:

Expected results

Result 1 => dgl

Result 2 => fh

Result 3 => ilsd

I tried many times with awk but without success

Step II: Insert a given character between each character of the resulting string

Example with /

Result 1 => d/g/l

Result 2 => f/h

Result 3 => i/l/s/d

I have an awk expression for this step awk -F '' -v OFS="/" '{$1=$1;print}'

I don't know if it is possible to do Step I with awk or sed and why not do Step I and Step II in once.

Kind Regards

Community
  • 1
  • 1
moocan
  • 101
  • 9
  • You might find this question helpful https://stackoverflow.com/questions/2777579/how-to-output-only-captured-groups-with-sed – Leonard Jul 04 '18 at 01:01
  • @moocan, please try to select any of the answer as correct answer and close the question completely see this link for more details too https://stackoverflow.com/help/someone-answers – RavinderSingh13 Jul 04 '18 at 02:19
  • You should include a case of back-to-back colons in your sample input/output (e.g. foo::bar) if it can occur as that could be hard to handle depending on your requirements for doing so. Is the output `o` or `o:` or something else? If it cannot happen then add a statement to your question saying so. – Ed Morton Jul 04 '18 at 11:34
  • @RavinderSingh13 I apologize for my late answer because I was offline – moocan Jul 07 '18 at 01:12
  • 1
    @Ed Morton, that can not happen in my case ... but it's a very good advice – moocan Jul 07 '18 at 01:14

8 Answers8

1

What about:

awk 'BEGIN{FS=":"}{for(i=1;i<NF;i++){if(i>2)printf"/";printf substr($i,length($i))}print""}' input.txt

input.txt:

:hd:fg:kl:
:df:lkjh:
:glki:l:s:d:

Output:

d/g/l
f/h
i/l/s/d
Lacobus
  • 1,221
  • 9
  • 16
1

Solution 1st: Could you please try following and let me know if this helps you.

awk -F":" '
{
  for(i=1;i<=NF;i++){
    if($i){ val=(val?val:"")substr($i,length($i)) }
  }
  print val;
  val=""
}' Input_file

Output will be as follows.

dgl
fh
ilsd

Solution 2nd: With a / in between output strings.

awk '
BEGIN{
  OFS="/";
  FS=":"
}
{
  for(i=1;i<=NF;i++){
    if($i){
      val=(val?val OFS:"")substr($i,length($i))
    }}
  print val;
  val=""
}' Input_file

Output will be as follows.

d/g/l
f/h
i/l/s/d

Solution 3rd: With match utility of awk.

awk '
{
  while(match($0,/[a-zA-Z]:/)){
    val=(val?val:"")substr($0,RSTART,RLENGTH-1)
    $0=substr($0,RSTART+RLENGTH)
   }
  print val
  val=""
}'  Input_file
RavinderSingh13
  • 101,958
  • 9
  • 41
  • 77
  • 1
    In my question, I always finished my examples with ":" which is a mistake from me because it can also end with any letter. In the case of a pattern such as ":hfd:l:jh:m", the output is "dlhm" for your first solution and "d/l/h/m" for the second solution. Your third solution works well because the output is "dlh". – moocan Jul 07 '18 at 01:48
  • @moocan, sue thanks for letting me know, please try to keep question's samples as per your requirement only because solutions will be given as per your samples, cheers and happy learning. – RavinderSingh13 Jul 07 '18 at 02:19
0

This might work for you (GNU sed):

sed -r 's/[^:]*([^:]):+|:+/\1/g;s/\B/\//g' file

Replace zero or more non :'s followed by a single character followed by a : or a lone : by the single character globally throughout the line. Then replace insert a / between each character.

potong
  • 47,186
  • 6
  • 43
  • 72
  • in the case of a pattern such as ":hfd:l:jh:m", the output is "d/l/ h/m". In my question, I always finished my examples with ":" which is a mistake from me because it can also end with any letter – moocan Jul 07 '18 at 01:32
0

Perl and negative lookahead:

$ perl -p -e 's/.(?!:)//g' file
dgl
fh
ilsd
James Brown
  • 31,411
  • 6
  • 31
  • 52
0

This is easier to do with perl

$ cat ip.txt
:hd:fg:kl:
:df:lkjh:
:glki:l:s:d:

$ perl -lne 'print join "/", /.(?=:)/g' ip.txt
d/g/l
f/h
i/l/s/d
  • /.(?=:)/g get all characters preceding :
  • the resulting matches are then printed using / as delimiter string
Sundeep
  • 19,273
  • 2
  • 19
  • 42
  • works very well with all my test pattern even if the pattern is not ending with ":" but with any letter. Thanks – moocan Jul 07 '18 at 02:01
0

With all sed with ERE

sed -E 's#[^:]*(.):#\1/#g;s/^.|.$//g' infile
ctac_
  • 2,295
  • 2
  • 5
  • 16
0

Using GNU sed:

sed -E 's/[^:]*([^:]):/\1/g; s/([^:])/\/\1/g; s/^:\///'

The first command, s/[^:]*([^:]):/\1/g matches strips out the extra characters and the colons (except the first one), so yields this:

:dgl
:fh
:ilsd

The second command s/([^:])/\/\1/g inserts a / before each character, yielding:

:/d/g/l
:/f/h
:/i/l/s/d

The last command s/^:\/// simply removes the :/ from the beginning of each line:

d/g/l
f/h
i/l/s/d
0

You could iterate across each line starting at the second character with gawk. Everytime the iterator hits a colon print the previous character.

$ awk <file.txt '{for(i=2;i<=length($0);i++) { \
                    if (substr($0,i,1)==":") printf substr($0,i-1,1);} printf "\n";}'
dgl
fh
ilsd
dr-who
  • 189
  • 6