3

What do these expressions mean? Where can I learn about their usage?

\\d 
\\D 
\\s 
\\S 
\\w 
\\W
\\t 
\\n 
^   
$   
\   
|  etc..

I need to use the stringr package and i have absolutely no idea how to use these .

Ben Bolker
  • 173,430
  • 21
  • 312
  • 389
Pankaj Kaundal
  • 874
  • 3
  • 11
  • 24
  • 2
    We usually expect some code question, along with effort and data and desired output. You can have a look to `?regexp`, http://regexr.com/, http://regexone.com/, Google, etc. – Vincent Bonhomme May 02 '16 at 12:35
  • From within R enter this `?regex` to get information on regular expressions. There are also links to tutorials and other information near the bottom of this page: https://code.google.com/archive/p/gsubfn/ – G. Grothendieck May 02 '16 at 12:37

1 Answers1

5

From ?regexp, in the Extended Regular Expressions section:

The caret ‘^’ and the dollar sign ‘$’ are metacharacters that respectively match the empty string at the beginning and end of a line. The symbols ‘\<’ and ‘>’ match the empty string at the beginning and end of a word. The symbol ‘\b’ matches the empty string at either edge of a word, and ‘\B’ matches the empty string provided it is not at an edge of a word. (The interpretation of ‘word’ depends on the locale and implementation: these are all extensions.)

From Perl-like Regular Expressions:

The escape sequences ‘\d’, ‘\s’ and ‘\w’ represent any decimal digit, space character and ‘word’ character (letter, digit or underscore in the current locale: in UTF-8 mode only ASCII letters and digits are considered) respectively, and their upper-case versions represent their negation. Vertical tab was not regarded as a space character in a ‘C’ locale before PCRE 8.34 (included in R 3.0.3). Sequences ‘\h’, ‘\v’, ‘\H’ and ‘\V’ match horizontal and vertical space or the negation. (In UTF-8 mode, these do match non-ASCII Unicode code points.)

Note that backslashes usually need to be doubled/protected in R input, e.g. you would use "\\h" to match horizontal space.

From ?Quotes:

Backslash is used to start an escape sequence inside character constants. Escaping a character not in the following table is an error.
\n newline
\r carriage return
\t tab

As others comment above, you may need a little more help if you're getting started with regular expressions for the first time. This is a little bit off-topic for StackOverflow (links to off-site resources), but there are some links to regular expression resources at the bottom of the gsubfn package overview. Or Google "regular expression tutorial" ...

Ben Bolker
  • 173,430
  • 21
  • 312
  • 389