1

Suppose I have a string:

string = "VNYTQAKENGSD"

And I need to find the positions where this expression holds.

N{P}[ST]{P} (Meaning 4 letters, [N,¬P,S or T, ¬P]

The output would be

2 9

because at position 2 you have NYTQ and at 9 NGSD

How to write this in regular expressions?

consider regex as the regular expressiom

 for(i in 1:nchar(string)){

 # If regex is equal to the substring of REGEX, get index.
   if(regex == substr(string, 1, nchar(regex))){
   vector = c(vector,i)
   } 

 #Reduce String
 string = substring(string,2)
 }  

Please help

Saul Garcia
  • 792
  • 1
  • 7
  • 20
  • 3
    Could you explain your requirement? Find `N`, then any char but `P`, then `S` or `T`, and any char but `P`? Something like [`N[^P][ST][^P]`](https://regex101.com/r/06KuuS/1)? Or do you mean to only match *letters*? Like [`N[A-OQ-Z][ST][A-OQ-Z]`](https://regex101.com/r/06KuuS/2)? – Wiktor Stribiżew Oct 06 '16 at 08:43
  • Yeah its like second option.. Thanks for the link, ill take a look. – Saul Garcia Oct 06 '16 at 11:25
  • Without any details, only you could answer the question correctly and thus I linked to the answer that helped you find more details on character classes. Now, your question is clear, and I will reopen. – Wiktor Stribiżew Oct 06 '16 at 11:26
  • I edited it, thank you. – Saul Garcia Oct 06 '16 at 11:38

1 Answers1

1

After clarification, it is clear you need a regex like

N[A-OQ-Z][ST][A-OQ-Z]

See the regex demo

Details:

  • N - matches 1 occurrence of N
  • [A-OQ-Z] - a character class that matches 1 ASCII uppercase letter from A till O and from Q to Z
  • [ST] - a character class that matches either S or T
  • [A-OQ-Z] - ibid.

See more information on character classes at regular-expressions.info.

In R (see online demo):

string <- "VNYTQAKENGSD"
z <- gregexpr("N[A-OQ-Z][ST][A-OQ-Z]", string)
z[[1]][1:length(z[[1]])]
## => [1] 2 9
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397