Locate position of first number in string [R]

Question

How can I create a function in R that locates the word position of the first number in a string?

For example:

string1 <- "Hello I'd like to extract where the first 1010 is in this string"
#desired_output for string1
9

string2 <- "80111 is in this string"
#desired_output for string2
1

string3 <- "extract where the first 97865 is in this string"
#desired_output for string3
5

Tim Biegeleisen · Answer 1 · 2020-11-03T02:59:16.060

5

I would just use grep and strsplit here for a base R option:

sapply(input, function(x) grep("\\d+", strsplit(x, " ")[[1]]))

Hello I'd like to extract where the first 1010 is in this string
                                                               9
                                         80111 is in this string
                                                               1
                 extract where the first 97865 is in this string
                                                               5

Data:

input <- c("Hello I'd like to extract where the first 1010 is in this string",
           "80111 is in this string",
           "extract where the first 97865 is in this string")

edited Nov 03 '20 at 02:59

answered Nov 03 '20 at 02:57

Tim Biegeleisen

387,723
20
200
263

This doesn't answer my question, it just searches for specified text ```1010``` – Neal Barsch Nov 03 '20 at 02:59
1

@NealBarsch Check the update, sorry, I didn't realize you wanted to search for _any_ number. I only needed to make a slight change to my answer above. – Tim Biegeleisen Nov 03 '20 at 03:00

score 4 · Accepted Answer · answered Nov 03 '20 at 02:54

Here is a way to return your desired output:

library(stringr)
min(which(!is.na(suppressWarnings(as.numeric(str_split(string, " ", simplify = TRUE))))))

This is how it works:

str_split(string, " ", simplify = TRUE) # converts your string to a vector/matrix, splitting at space

as.numeric(...) # tries to convert each element to a number, returning NA when it fails

suppressWarnings(...) # suppresses the warnings generated by as.numeric

!is.na(...) # returns true for the values that are not NA (i.e. the numbers)

which(...) # returns the position for each TRUE values

min(...) # returns the first position

The output:

min(which(!is.na(suppressWarnings(as.numeric(str_split(string1, " ", simplify = TRUE))))))
[1] 9
min(which(!is.na(suppressWarnings(as.numeric(str_split(string2, " ", simplify = TRUE))))))
[1] 1
min(which(!is.na(suppressWarnings(as.numeric(str_split(string3, " ", simplify = TRUE))))))
[1] 5

ekoam · Answer 3 · 2020-11-11T06:02:22.663

Here is another approach. We can trim off the remaining characters after the first digit of the first number. Then, just find the position of the last word. \\b matches word boundaries while \\S+ matches one or more non-whitespace characters.

first_numeric_word <- function(x) {
  x <- substr(x, 1L, regexpr("\\b\\d+\\b", x))
  lengths(gregexpr("\\b\\S+\\b", x))
}

Output

> first_numeric_word(x)
[1] 9 1 5

Data

x <- c(
  "Hello I'd like to extract where  the first 1010 is in this string", 
  "80111 is in this string", 
  "extract where the   first  97865 is in this string"
)

Edo · Answer 4 · 2020-11-03T22:04:50.413

Here I'll leave a fully tidyverse approach:

library(purrr)
library(stringr)

map_dbl(str_split(strings, " "), str_which, "\\d+")
#> [1] 9 1 5

map_dbl(str_split(strings[1], " "), str_which, "\\d+")
#> [1] 9

Note that it works both with one and multiple strings.

Where strings is:

strings <- c("Hello I'd like to extract where the first 1010 is in this string",
             "80111 is in this string",
             "extract where the first 97865 is in this string")

semaphorism · Answer 5 · 2020-11-05T22:07:10.237

Try the following:

library(stringr)

position_first_number <- function(string) {
  min(which(str_detect(str_split(string, "\\s+", simplify = TRUE), "[0-9]+")))
}

With your example strings:

> string1 <- "Hello I'd like to extract where the first 1010 is in this string"
> position_first_number(string1)
[1] 9
 
> string2 <- "80111 is in this string"
> position_first_number(string2)
[1] 1
 
> string3 <- "extract where the first 97865 is in this string"
> position_first_number(string3)
[1] 5

Andrew · Answer 6 · 2020-11-03T03:20:16.887

0

Here is a base solution using rapply() w/ grep() to recurse through the results of strsplit() and works with a vector of strings.

Note: swap " " and fixed = TRUE with "\\s+" and fixed = FALSE (the default) if you want to split the strings on any whitespace instead of a literal space.

rapply(strsplit(strings, " ", fixed = TRUE), function(x) grep("[0-9]+", x))
[1] 9 1 5

Data:

strings = c("Hello I'd like to extract where the first 1010 is in this string", 
            "80111 is in this string", "extract where the first 97865 is in this string")

edited Nov 03 '20 at 03:20

answered Nov 03 '20 at 03:01

Andrew

4,653
2
8
20

Hey @TimBiegeleisen, if I am honest, I am not completely sure what you mean. Can you clarify? – Andrew Nov 03 '20 at 03:08

Locate position of first number in string [R]

6 Answers6