-2

I need to get the number after a word in a data table column, for example:

y = data.table(status =c( "client rating 01 approved", "John Rating: 2 reproved", "Customer rating9") )

Then, I need to get the number after the word rating and create a new column with that rating number, in the example, it should be: rating = c(1,2,9).

How could I do that considering the variations after the rating like :, double space, no space?

r2evans
  • 77,184
  • 4
  • 55
  • 96
  • 1
    In those examples, `as.integer(gsub("\\D", "", status))` would work, but from your question I'm inferring that you have other examples without "rating" in them. – r2evans May 04 '20 at 21:07
  • 1
    You need to be precise about your requirements. Instead of "...like :, double space, no space...", tell us *exactly* what may appear between "rating" and the digit(s). Pretend you are writing a code spec, where there's no place for "like" or vagueness generally. (In fact, writing SO questions that are precise and unambiguous is good practice for writing code specs.) – Cary Swoveland May 04 '20 at 22:14
  • If zero or more characters other than letters and digits can appear between "rating" and the digit(s), consider the regular expression `\brating[^a-z\d]*(\d+)` (not in R format) which has a capture group that will contain the digit(s) if there is a match. [Demo](https://regex101.com/r/djoLVH/2/). If the rule for what may appear between "rating" and the digit(s) differs from what I've assumed, change the regex accordingly. – Cary Swoveland May 04 '20 at 22:19

1 Answers1

0

We could use sub to capture the digits (\\d+) after the 'rating' including characters : or spaces, and convert to numeric with as.numeric

library(data.table)
y[, num := as.numeric(sub(".*rating[^0-9]*(\\d+)\\b.*", "\\1",
         status, ignore.case = TRUE))]
y
#                      status num
#1: client rating 01 approved   1
#2:   John Rating: 2 reproved   2
#3:          Customer rating9   9
akrun
  • 674,427
  • 24
  • 381
  • 486
  • 3
    Gustavo, akrun has provided a very good solution and one which you may wish to select, but be mindful that a quick selection can discourage other answers and may not be appreciated by those still working on their answers. The point is there is no rush, just don't forget to make a selection if you find at least one of the answers to be helpful. Most askers who have been at SO for awhile wait at least a couple of hours; some wait much longer, giving time for then-sleeping members an opportunity to answer. – Cary Swoveland May 04 '20 at 21:49
  • 1
    You may wish to add word breaks to avoid matching, for example, `"grating"`. – Cary Swoveland May 04 '20 at 21:54
  • the word break might have an issue with the last case. I added `[^0-9]*` – akrun May 04 '20 at 21:57