3

I need to identify matching course number that have xx.3xxxxxx. These are some examples of the course numbers.

26.3730004   
27.0210000    
26.3730009   
26.7114001   
23.9610071  
26.0A34430    
23.3670005    
26.0B05430    

I tried many patterns one example I used is the pattern below. It did not get any match.

"[^0-9]{2}\Q.\E3[^0-9]+$"

I tried using grep and grepl. I actually need the code to return indexes.

This code shows my attempt to tag the rows that have matches.

Teacher$virtual[
            which(
                 grepl("[^0-9]{2}\\Q.\\E3[^0-9]+$",Teacher$CourseNumber))]
               <- "1"

I need to remove any row from my dataframe that have the course number with that pattern. XX.3XXXXXX

But, my code did not find any match. Can you please help me?

kath
  • 7,191
  • 15
  • 31
Lilian Tan
  • 35
  • 3

2 Answers2

1

Here, this simple expression would likely cover that:

^[0-9]{2}\.[3].+$

which has a [3] boundary right after the .. It would probably work without start and end anchors:

[0-9]{2}\.[3].+

Demo

We can add or reduce the boundaries, if it'd be necessary.

Emma
  • 1
  • 9
  • 28
  • 53
1

You should use

grepl("^[0-9]{2}\\.3", Teacher$CourseNumber)

See the regex graph:

enter image description here

Details:

  • ^ - start of a string
  • [0-9]{2} - two digits
  • \\. - a dot (note that a regex escape is a literal backslash, but inside a string literal, "...", a single backslash is used to form string escape sequences, hence the backslash must be double to obtain a literal backslash char necessary for a regex escape)
  • 3 - a 3 char.

NOTE: If you want to use in-pattern quoting with \Q and \E (in between which all chars are treated literally) you need to use PCRE regex, add perl=TRUE and use

grepl("^[0-9]{2}\\Q.\\E3", Teacher$CourseNumber, perl=TRUE)

Now, the dot is treated as a literal dot, not a . metacharacter that matches any char but a line break char (in a PCRE regex, . does not match line break chars by default).

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • the `perl=TRUE` option should be the default, IIRC it is both more efficient and more featureful – Rorschach Jun 05 '19 at 18:16
  • @jenesaisquoi I did not test the latest versions, but some time ago, our tests at SO showed that PCRE regex is faster in Linux and MacOS, but the default TRE is faster in Windows. Also, speaking about features, TRE, being a text-directed engine, picks the longest alternative from a group, and might be preferred in some cases. See [TRE vs. PCRE comparison](https://stackoverflow.com/questions/47240375/regular-expressions-in-base-r-perl-true-vs-the-default-pcre-vs-tre/47251004#47251004). Actually, TRE supports fuzzy matching, PCRE does not. – Wiktor Stribiżew Jun 05 '19 at 18:22
  • there is some discussion of the R implementations buried in the bowels of the R documentation where they talk about performance. I'm sure you're right that there are some cases where it's preferred – Rorschach Jun 05 '19 at 18:24