How to replace string by its own part

Question

I have one column in data.table in R which looks like this.

[1] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\",
[2] "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\",
[3] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\",
[4] "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\",
[5] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\",

But only thing that I care about is whether it is "UNIT_RESULT", "UNIT_CHECKIN", "OEE_DATA" or "PING", so I would like to replace each of row by new string ("UNIT_RESULT" etc.)

Result should looks like:

[1] "UNIT_RESULT"
[2] "UNIT_CHECKIN"
[3] "UNIT_CHECKIN"
[4] "OEE_DATA"
[5] "PING"

I have spent many hours by trying to find how to replace string by its own part but nothing showed me any useful result.

Replace specific characters within strings

Reference - What does this regex mean?

Test if characters in string in R

In the beginning function substring(x, 53, 63) looks like solution for me but it just choose fixed symbols in string so unless I have all rows same it is useless.

Any hints?

I have edited the post. Have a look now, it should make sense for you. — makoLP, Jun 07 '18 at 14:14
What should happen in the case of no match? For instance, what if the fifth string did not contain `PING`? — r2evans, Jun 07 '18 at 14:20
List of possible strings are final. There is lets say 15 possible results. So no chance of no matching. — makoLP, Jun 07 '18 at 18:51
@makoLP - Does my post below meet your needs or did I miss something? Let me know...glad to help. — Stan, Jun 07 '18 at 19:35

divibisan · Accepted Answer · 2018-06-07T14:22:51.867

The str_match_all function will apply a regex to each element of a vector of strings and return only the match. So we can make a list of all the terms we want to extract and use paste0 to join them together with the | OR operator to make a single regular expression that matches any of the 4 desired terms.

Then we just run the str_match_all function and unlist the resulting list into a character vector.

strings <- c("<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\"",
             "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
             "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
             "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\"",
             "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\""
)

items <- c('UNIT_RESULT', 'UNIT_CHECKIN', 'OEE_DATA', 'PING')

library(stringr)
unlist(str_match_all(strings, paste0(items,collapse = '|')))
[1] "UNIT_RESULT"  "UNIT_CHECKIN" "UNIT_CHECKIN" "OEE_DATA"     "PING"

Interesting, I didn't know of the extra functionalities of str_match compared to str_extract. — Luis, Jun 07 '18 at 14:29

score 0 · Answer 2 · answered Jun 07 '18 at 14:21

An alternative is to use str_extract. You pass your string as the 'string' argument and the alternatives you gave as the 'pattern' argument, and it will return whatever of your alternatives is the first one to appear in the string.

library(stringr)

DT[, newstring := str_extract(string_column, "UNIT_RESULT|UNIT_CHECKIN|OEE_DATA|PING")]

score 0 · Answer 3 · answered Jun 07 '18 at 14:22

0

I suggest

gsub("^.*?(UNIT_RESULT|UNIT_CHECKIN|OEE_DATA|PING).*","\\1",strings,perl=TRUE)

answered Jun 07 '18 at 14:22

Nicolas2

2,020
1
4
13

Stan · Answer 4 · 2018-06-07T19:39:34.717

If you do not have a finite list of strings you are searching for I would recommend using a reg-ex pattern. Here is one that works based on the examples you provided:

# Code to create example data.table
library(data.table)

dt <- data.table(f1 =  c("<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\"",
                     "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
                     "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
                     "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\"",
                     "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\""
))

# Start of code to parse out values:
rex_pattern <- "(?<=(\"))[A-Z]{2,}_*[A-Z]+(?=(\"))"

dt[, .(parsed_val = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]

This gives you:

     parsed_val
1:  UNIT_RESULT
2: UNIT_CHECKIN
3: UNIT_CHECKIN
4:     OEE_DATA
5:         PING

If you really want to "overwrite" the original field f1 with the new substring, you can use the following:

dt[, `:=`(f1 = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]

I list of strings that can be there is finite. I have tried your solution but it doesn´t work. Error in .(parsed_val = regmatches(f1, regexpr(pattern = rex_pattern, : could not find function "." — makoLP, Jun 08 '18 at 05:51
Did you run `library(data.table)` on the first line of my first block of code? I can recreate your `could not find function "."` error if I do not load the `data.table` library. — Stan, Jun 08 '18 at 16:22

How to replace string by its own part

4 Answers4