2

Is there a way I can search for a pattern in rows of data and then store them in separate columns of a new table? For example, if I need to extract amount, bills and coins from the body below, do you think it’s possible to achieve the desired result on R

user_id   |        ts |                 body                    |  address |    
3633|      2016-09-29|  A wallet with amount = $ 100 has been found with 4 bills and 5 coins|   TEST |    
4266|      2016-07-20|  A purse having amount = $ 150 has been found with 40 bills and 15 coins|    NAME |
7566|      2016-07-20|  A pocket having amount = $ 200 has been found with 4 bills and 5 coins| HELLO |

(This is the desired result)

user_id   | Amount | Bills| Coins|
3633      | $100   |    4  |     5|
4266      | $150   |    40 |    15|
7566      | $200   |    10 |    10|
nicola
  • 21,926
  • 2
  • 28
  • 48
S Jain
  • 21
  • 3
  • Yes, it is possible. You will want to use regular expressions. See `?regex`. Something to the [effect of this](http://stackoverflow.com/questions/14159690/regex-grep-strings-containing-us-currency). – Roman Luštrik Nov 26 '16 at 09:21

1 Answers1

0

Here's one solution with stringr and lapply, though there must be many more. First subset only the user.id and body columns to give something like the following:

df <- data.frame(user.id = c(3633, 4266, 7566),
      body = c("A wallet with amount = $ 100 has been found with 4 bills and 5 coins",
               "A purse having amount = $ 150 has been found with 40 bills and 15 coins",
               "A pocket having amount = $ 200 has been found with 4 bills and 5 coins"))

Now we'll apply a regular expression to all rows of df to extract the numbers to a list, unlist, convert to a matrix specifying column names, transpose and cbind to user.id from the original data frame.

library(stringr)
mat <- t(matrix(unlist(lapply(df, str_match_all, "[0-9]+")[2]), nrow = nrow(df)))
colnames(mat) <- c("Amount", "Bills", "Coins")
outputdf <- cbind(df[1], mat)

That gives:

> outputdf
#  user.id Amount Bills Coins
#1    3633    100     4     5
#2    4266    150    40    15
#3    7566    200     4     5

I'm sure there's probably a neater way of doing it too.

Joe
  • 5,939
  • 1
  • 42
  • 49