-2

I have a character vector with 8 string elements. I am trying to understand how to use regex to identify objects based on certain criteria.

"Horse" "21-35" "house" "orange" "I271" "78.96" "B42" "yes/no"

I would like to identify objects that start with a certain value, let's say any number.

grep("^[0-9]+", string, value = TRUE)

should work based on the readings I've done for regex but it seems to be giving me only objects that start with letters. Alternatively,

grep("[a-zA-Z]+", string, value = TRUE)

seems like it should work but this gives me all of the elements containing 1 letter. I would like to do more than something as mundane as this but I need to learn how to use the applications before moving on.

regents
  • 365
  • 3
  • 13
  • I suspect you need the whole string match, `grep("^[0-9]+$", string, value = TRUE)` or `grep("^[a-zA-Z]+$", string, value = TRUE)` – Wiktor Stribiżew May 01 '18 at 20:23
  • 2
    And what output do you expect? – Jan May 01 '18 at 20:23
  • I don't get what you want – denis May 01 '18 at 20:24
  • `^[0-9]` will give you objects that only start with a **digit**! – Jan May 01 '18 at 20:28
  • @Jan How can I exclude all numbers? I thought I had read that "^" was to indicate "not" but clearly I was mistaken because you are right. – regents May 01 '18 at 20:54
  • Use `invert=TRUE` to get the inverse result: `grep("[0-9]", string, invert=TRUE, value = TRUE)` will only fetch those items that do not contain a digit. Please make your question answerable by clarifying what ouput you expect given your input vector. – Wiktor Stribiżew May 01 '18 at 20:57
  • The "^" operator will only negate _inside_ a character class operation. When it is the first character inside a pattern, it requires the next matching "rule" in the regex pattern to be satisfied by the first character in the x argument items. – IRTFM May 01 '18 at 20:57

1 Answers1

0

It's not clear what you expected. I get what I expected from this input:

 string <- c("Horse", "21-35", "house", "orange", "I271", "78.96" ,"B42", "yes/no")
 grep("^[0-9]+", string, value = TRUE)
[#1] "21-35" "78.96"

This pattern asks for any string whose first character is in the ASCII range 0-9. It doesn't necessarily match what could be converted to a numeric value, were one to use as.numeric. For that one might consider using as.numeric's value as an index:

 string[ !is.na(as.numeric(string)) ]
[1] "78.96"
Warning message:
NAs introduced by coercion 

From your later comment I now suspect you wanted this result using a pattern with the first "^" meaning that the next rule needs to be satisfied by the first character and the second "^" (inside the character class) acting as a negation operation.:

 grep("^[^0-9]", string, value = TRUE)
[1] "Horse"  "house"  "orange" "I271"   "B42"    "yes/no"
IRTFM
  • 240,863
  • 19
  • 328
  • 451