Regular Expressions Insight in R

Question

Suppose I have a string:

string = "VNYTQAKENGSD"

And I need to find the positions where this expression holds.

N{P}[ST]{P} (Meaning 4 letters, [N,¬P,S or T, ¬P]

The output would be

2 9

because at position 2 you have NYTQ and at 9 NGSD

How to write this in regular expressions?

consider regex as the regular expressiom

 for(i in 1:nchar(string)){

 # If regex is equal to the substring of REGEX, get index.
   if(regex == substr(string, 1, nchar(regex))){
   vector = c(vector,i)
   } 

 #Reduce String
 string = substring(string,2)
 }

Please help

Could you explain your requirement? Find `N`, then any char but `P`, then `S` or `T`, and any char but `P`? Something like [`N[^P][ST][^P]`](https://regex101.com/r/06KuuS/1)? Or do you mean to only match *letters*? Like [`N[A-OQ-Z][ST][A-OQ-Z]`](https://regex101.com/r/06KuuS/2)? — Wiktor Stribiżew, Oct 06 '16 at 08:43
Yeah its like second option.. Thanks for the link, ill take a look. — Saul Garcia, Oct 06 '16 at 11:25
Without any details, only you could answer the question correctly and thus I linked to the answer that helped you find more details on character classes. Now, your question is clear, and I will reopen. — Wiktor Stribiżew, Oct 06 '16 at 11:26

Wiktor Stribiżew · Accepted Answer · 2016-10-06T12:56:28.497

1

After clarification, it is clear you need a regex like

N[A-OQ-Z][ST][A-OQ-Z]

See the regex demo

Details:

N - matches 1 occurrence of N
[A-OQ-Z] - a character class that matches 1 ASCII uppercase letter from A till O and from Q to Z
[ST] - a character class that matches either S or T
[A-OQ-Z] - ibid.

See more information on character classes at regular-expressions.info.

In R (see online demo):

string <- "VNYTQAKENGSD"
z <- gregexpr("N[A-OQ-Z][ST][A-OQ-Z]", string)
z[[1]][1:length(z[[1]])]
## => [1] 2 9

edited Oct 06 '16 at 12:56

answered Oct 06 '16 at 11:29

Wiktor Stribiżew

484,719
26
302
397

1

Well thank you, this was more than insightful! I was just about to try it with a loop. I was testing weather it is True or False. `grep("N[A-OQ-Z][ST][A-OQ-Z]","NASA")` – Saul Garcia Oct 06 '16 at 11:51
Well I cant ignore the attribute, > string = "VNYTQAKENGSD" > gregexpr("N[A-OQ-Z][ST][A-OQ-Z]", string)[[1]] [1] 2 9 attr(,"match.length") [1] 4 4 attr(,"useBytes") [1] TRUE – Saul Garcia Oct 06 '16 at 12:39
You mean you want to output just the two indices? – Wiktor Stribiżew Oct 06 '16 at 12:41
Yes, at some point it was working, but after trying to pass it on a loop to apply it in different strings, it keeps giving me the attributes aswell. And [[]] is not helping me get the first vector of the list. And now the same code you wrote, is giving me the attributes everytime. – Saul Garcia Oct 06 '16 at 12:43
Nevermind, this worked! unlist(grep("N[A-OQ-Z][ST][A-OQ-Z]","NASA")) – Saul Garcia Oct 06 '16 at 12:46
1

I added a `gregexpr` way. – Wiktor Stribiżew Oct 06 '16 at 12:56
Thank you very much – Saul Garcia Oct 06 '16 at 12:59

Regular Expressions Insight in R

1 Answers1