0

I have multiple columns which contain strings of data.

(data$product, data$price, data$overview1, data$overview2, data$overview3, data$overview4)

I would like to create a new vector which only contains strings which begin with the string "Material:"

Setting the pattern for GREP

    matpattern <- "((?<=Material: ).*|(?<=Materials: ).*)"

Get strings which have material at start

    mat <- gregexpr(matpattern, data$Overview1, perl=TRUE)

Create vector to store string

     data$material1 <- regmatches(data$Overview1, mat, invert = FALSE)

/repeat for overview2/

    mat <- gregexpr(matpattern, data$Overview2, perl=TRUE)

    data$material2 <- regmatches(data$Overview2, mat, invert = FALSE)

The statement

    z <- cbind(material1, material2) 

gives a matrix when I want a list

Is there a method to get lapply & gregexpr to work across multiple columns and then place the new strings in a single column?

I have looked below, with no avail, thanks for your help.

Convert R vector to string vector of 1 element

Regular Expressions in R - compare one column to another

Using regexp to select rows in R dataframe

Community
  • 1
  • 1
conr404
  • 295
  • 2
  • 4
  • 16
  • have you tried `apply(data, 2, gregexpr, pattern, perl=TRUE)` ? – Ricardo Saporta Oct 22 '13 at 17:33
  • I think you're using list in the general sense and not the datatype. And when you say single column, do you mean vector? `c(data$material1,data$material2,data$material3,data$material4)`? – TheComeOnMan Oct 22 '13 at 17:34
  • Thanks for the replies @Codoremifa: Exactly, a single vector rather than a list. Happy to use a list datatype, but not sure how to manipulate to reference values in other vectors (i.e average of data$price for certain values of data$materials) – conr404 Oct 23 '13 at 10:29
  • @RicardoSaporta tried apply and lappy but getting error "FUN(newX[, i], ...) : invalid 'pattern' argument"...thanks. Going to Grep all the vectors at once and then figure out how to manipulate the list generated using regmatches to get the vector. Any insight would be great! – conr404 Oct 23 '13 at 10:30
  • @conr404, can you share some data please? – TheComeOnMan Oct 23 '13 at 10:51

1 Answers1

0

OK. This is a a complete hack, but I would like the final output to be a vector, rather than a list (ruling out apply, lapply?)

This gets the location and length of the required string across the 4 columns

m1 <- gregexpr(matpattern, data[ ,c("Overview1")], perl=TRUE)

m2 <- gregexpr(matpattern, data[ ,c("Overview2")], perl=TRUE)

m3 <- gregexpr(matpattern, data[ ,c("Overview3")], perl=TRUE)

m4 <- gregexpr(matpattern, data[ ,c("Overview4")], perl=TRUE)

This operation creates a set of vectors

mat1 <- regmatches(data[ ,c("Overview1")], m1, invert = FALSE)

mat2 <- regmatches(data[ ,c("Overview2")], m2, invert = FALSE)

mat3 <- regmatches(data[ ,c("Overview3")], m3, invert = FALSE)

mat4 <- regmatches(data[ ,c("Overview4")], m4, invert = FALSE)

Then I paste all the vectors into one big one (future operations will ignore 'character(0)')

data$Material <-paste(mat1,mat2,mat3,mat4)

I can then use this vector to calculate the mean of data$price based on occurrence of certain text strings in data$Material

Community
  • 1
  • 1
conr404
  • 295
  • 2
  • 4
  • 16