2

this in theory simple task turned out to drive me crazy today. I'm rather new to R, but got along quite well until now. Maybe someone of you is having an easier time to solve it.

In short: How do I get the maximum values per observation out of a somehow 'mixed' character matrix similar to this one?

dummy = as.matrix(c("c(1.5,2.6,3)", "2", "1.5", "c(1.8, 2.9)"))

so that my result says (in numeric): c(3, 2, 1.5, 2.9)

The longer story:

I'm coming from a

stri_match_all_regex(somestring, regexp)

to get some numbers from a plain text. This returns me a character matrix (by definition of the stri_match_all_regex function)

let it look something similar like this after stripping out some stray characters:

dummy = as.matrix(c("c(1.5,2.6,3)","2","1.5","c(1.8,2.9)"))

You already see the complication of the strings instead of vectors in my matrix here. My desired state is to identify the maximum value of each row.

Usually nothing simpler as that, I'd e.g. run

lapply(dummy, max)

But applying numerical functions obviously won't work with these characters disguised as numericals.(until this point I did not even realize that these are all characters and not numbers as they show up without quotation marks in rStudio View(dummy) ). Turning it into numerics with

as.numeric(dummy)

makes me lose the vectors within the matrix with NAs. Not what I want. I want each "c(1.2,5)" interpreted as if it would be a 'real'/'quotation-mark-less c(1.2,5), and the numbers as numbers of course too.

I even tried to strsplit / gsub the columns but that doesn't seem fruitful either or I'm just doing it wrong.

gsub( ",|c\\(|\\)", ",", dummy)

leaves me with NAs as the , isn't properly interpreted and

as.numeric(strsplit(dummy, ",|.\\(|\\)"))

won't allow me to coerce th elist object returned to numeric

Hence the straightforward question: How do I turn a character Matrix similar to dummy into a "usable" form to apply numeric functions on both, the plain numbers and the vectors consisting of numbers?

Thanks for your help! I feel like this should be easy.. but I'm stuck with it for quite a while now.

3bbing
  • 75
  • 5
  • 2
    Could you actually post the `stri_match_all_regex` part with some of the data you're searching there? If that's a part of your code where you can edit, I'm wondering if the issue could be caught further upstream – camille Jul 10 '18 at 17:56
  • 1
    Hi camille, of course I can. So let's take one of the easier ones. I am working on extracting system specs, in this case e.g. the CPU speed needed. I use this regexp: `DT[ , Ghz_CPU_unclean := stri_match_all_regex(clean, "(?! *Ghz) +[0-9.]{1,5} *Ghz", case_insensitive = TRUE)]` on a text like this: `OS: Windows® 7 32/64-bit / Vista 32/64 / XP Processor: Pentium 4 3.0GHz Memory: 2 GB RAM` with many more rows in not uniform layout to come. The regexp works quite ok, bnut would of course be open for improvement. – 3bbing Jul 10 '18 at 18:07

3 Answers3

2

You can use eval/parse to get the numeric values.

result <- apply(dummy, 1, function(s) {
  eval(parse(text = s))
})

result
#[[1]]
#[1] 1.5 2.6 3.0
#
#[[2]]
#[1] 2
#
#[[3]]
#[1] 1.5
#
#[[4]]
#[1] 1.8 2.9
Rui Barradas
  • 44,483
  • 8
  • 22
  • 48
  • Oh wow.. this is a lifesaver. Works like a charm! Thanks so much! Should probably have asked this hours ago ;-) Haven't found this function whereever I looked for the life of me. Perfect! – 3bbing Jul 10 '18 at 18:14
1

If you'd like a tidyverse solution, here's one that makes use of purrr and stringr. Mapping along the items in dummy, I remove any "c" and parentheses from each entry, split it by commas and (optionally) space, flatten into a single-level list, and convert to numeric.

library(tidyverse)

dummy <- as.matrix(c("c(1.5,2.6,3)", "2", "1.5", "c(1.8, 2.9)"))

map(dummy, ~str_remove_all(., "[c\\(\\)]") %>% 
      str_split(",\\s?") %>% 
      flatten_chr() %>% 
      as.numeric()
    )
#> [[1]]
#> [1] 1.5 2.6 3.0
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 1.5
#> 
#> [[4]]
#> [1] 1.8 2.9

Created on 2018-07-10 by the reprex package (v0.2.0).

camille
  • 13,812
  • 10
  • 29
  • 45
1

You can use this:

apply(dummy, 1, function(x) max(eval(parse(text=x))))

Result:

[1] 3.0 2.0 1.5 2.9
iod
  • 6,861
  • 2
  • 13
  • 30
  • I see @Rui Barradas has a similar answer already up. Mine goes the final step to extract the max and unlist it to create a single vector of the maxes. – iod Jul 10 '18 at 18:09
  • 1
    Use `sapply` instead of `unlist(lapply…` – Onyambu Jul 10 '18 at 18:11