Wrangle data in R

Question

I would like to go from the left table to the right table in the image below but can't seem to find the coding logic behind to reach the result using R.

Tables

Your help is much appreciated !

Hi @Charliebr0wn, could you show us what you tried with formatted code and data ? You can find all means to do so at : [how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — cbo, Nov 26 '19 at 09:01
thank you very much. It's actually my first post so i'm not really familiar with this platform. I'll try to make it more explicit ! ;) — Charliebr0wn, Nov 26 '19 at 09:03

score 0 · Answer 1 · answered Nov 26 '19 at 09:24

I have created a minimal example that should do what you want. The main issue here is to phrase your question since I think there are better answer than mine to match lag values to a modality.

library(dplyr)

# --- v0 is your data simplified
v0 <- c("cA", "t1", "t2", "cB", "t3") 

# --- indic tels us what are the groups
indic <- v0 %>%  stringr::str_detect(string = ., pattern = "c") %>% 
        cumsum()

# --- here you can try the code line by line (without the %>% (pipe) operator to understand the code
dfr <- tibble(v0, indic)
dfr %>% 
        group_by(indic) %>% 
        mutate(v1 = v0[which(stringr::str_detect(v0, "c") )] ) %>% 
        ungroup() %>% 
        filter(! stringr::str_detect(v0, "c")) %>% 
        select(v1, v0)

#> # A tibble: 3 x 2
#>   v1    v0   
#>   <chr> <chr>
#> 1 cA    t1   
#> 2 cA    t2   
#> 3 cB    t3

# you could also use a loop

score 0 · Answer 2 · answered Nov 26 '19 at 09:31

An example using base R:

data <- c(
  "cinema A", "17:45", "20:00", "cinema B", "13:00", "15:45", "16:00", 
  "cinema C", "08:20"
)

time_rows <- grep("cinema", data, invert = TRUE)
data.frame(
  time = data[time_rows],
  cinema = grep("cinema", data, value = TRUE)[cumsum(grepl("cinema", data))][time_rows]
)

score 0 · Answer 3 · answered Nov 26 '19 at 09:44

As written in the comments, please provide some sample data for future posts. In this case, I have made it for you based on your attached picture.

There are many ways to solve this. Here is a three-step approach.

library(tidyverse)
library(stringr)

# Create the data
df <- tibble(
  X1 = c("cinema A", 17.45, 20.00, "cinema B", 13.00, 15.45, 16.00, "cinema C", 8.20))

df
#> # A tibble: 9 x 1
#>   X1      
#>   <chr>   
#> 1 cinema A
#> 2 17.45   
#> 3 20      
#> 4 cinema B
#> 5 13      
#> 6 15.45   
#> 7 16      
#> 8 cinema C
#> 9 8.2

# Step 1: detect where the cinema values are and copy them to a new column
df$cinema <- ifelse(str_detect(df$X1, "cinema"), df$X1, NA)
df
#> # A tibble: 9 x 2
#>   X1       cinema  
#>   <chr>    <chr>   
#> 1 cinema A cinema A
#> 2 17.45    <NA>    
#> 3 20       <NA>    
#> 4 cinema B cinema B
#> 5 13       <NA>    
#> 6 15.45    <NA>    
#> 7 16       <NA>    
#> 8 cinema C cinema C
#> 9 8.2      <NA>

# Step 2: replace NA values in the new column with the values above
df <- fill(df, cinema)
df
#> # A tibble: 9 x 2
#>   X1       cinema  
#>   <chr>    <chr>   
#> 1 cinema A cinema A
#> 2 17.45    cinema A
#> 3 20       cinema A
#> 4 cinema B cinema B
#> 5 13       cinema B
#> 6 15.45    cinema B
#> 7 16       cinema B
#> 8 cinema C cinema C
#> 9 8.2      cinema C

# Step 3: remove the rows where X1 contains cinema information
df <- filter(df, !str_detect(df$X1, "cinema"))
df
#> # A tibble: 6 x 2
#>   X1    cinema  
#>   <chr> <chr>   
#> 1 17.45 cinema A
#> 2 20    cinema A
#> 3 13    cinema B
#> 4 15.45 cinema B
#> 5 16    cinema B
#> 6 8.2   cinema C

^{Created on 2019-11-26 by the reprex package (v0.3.0)}

score 0 · Answer 4 · answered Nov 26 '19 at 10:04

Here is a solution for base R.

Assuming the input is given as a data frame, i.e.:

df <- data.frame(X = c("cinema A", 17.45, 20.00, "cinema B", 13.00, 15.45, 16.00, "cinema C", 8.20))
> df
         X
1 cinema A
2    17.45
3       20
4 cinema B
5       13
6    15.45
7       16
8 cinema C
9      8.2

the following code may help you to get the table on the right hand side:

lst <- split(df,findInterval(seq(nrow(df)),grep("cinema",df$X)-1,left.open = T))
res <- Reduce(rbind,lapply(lst, function(v) data.frame(ViewingTime = v[-1,],CinemaName = v[1,])))

where the output res looks like:

> res
  ViewingTime CinemaName
1       17.45   cinema A
2          20   cinema A
3          13   cinema B
4       15.45   cinema B
5          16   cinema B
6         8.2   cinema C

Wrangle data in R

4 Answers4