1

I'm trying to rotate multiple columns of data into single, data-type consistent columns.

I've created a minimum example below.

library(tibble)
library(dplyr)


# I have data like this
df <- tibble(contact_1_prefix=c('Mr.','Mrs.','Dr.'),
             contact_2_prefix=c('Dr.','Mr.','Mrs.'),
             contact_1 = c('Bob Johnson','Robert Johnson','Bobby Johnson'),
             contact_2 = c('Tommy Two Tones','Tommy Three Tones','Tommy No Tones'),
             contact_1_loc = c('Earth','New York','Los Angeles'),
             contact_2_loc = c('London','Geneva','Paris'))

# My attempt at a solution:  
df %>%  rename(contact_1_name=contact_1, 
               contact_2_name=contact_2) %>% 
        pivot_longer(cols=c(matches('_[12]_')), 
               names_to=c('.value','dat'),
               names_pattern = "(.*)_[1-2]_(.*)") %>% 
        pivot_wider(names_from='dat',values_from='contact')

#What I want is to widen that data to achieve a tibble with these two example lines
df_desired <- tibble(name=c('Bob Johnson','Tommy Two Tones'),
                     loc =c('Earth','London'),
                     prefix=c('Mr.','Dr.'))

I want all names under name, all locations under loc, and all prefixes under prefix.

If I use just this snippet from the middle statement:

df %>%  rename(contact_1_name=contact_1, 
               contact_2_name=contact_2) %>% 
        pivot_longer(cols=c(matches('_[12]_')), 
               names_to=c('.value','dat'),
               names_pattern = "(.*)_[1-2]_(.*)") 

The dput of the output is:

structure(list(dat = c("prefix", "prefix", "name", "name", "loc", 
"loc", "prefix", "prefix", "name", "name", "loc", "loc", "prefix", 
"prefix", "name", "name", "loc", "loc"), contact = c("Mr.", "Dr.", 
"Bob Johnson", "Tommy Two Tones", "Earth", "London", "Mrs.", 
"Mr.", "Robert Johnson", "Tommy Three Tones", "New York", "Geneva", 
"Dr.", "Mrs.", "Bobby Johnson", "Tommy No Tones", "Los Angeles", 
"Paris")), row.names = c(NA, -18L), class = c("tbl_df", "tbl", 
"data.frame"))

From that, I thought for sure pivot_wider was the solution, but there is a name conflict.

I assume a single pivot_longer statement will achieve the task. I studied Gathering wide columns into multiple long columns using pivot_longer carefully but can't quite figure this out. I have to admit I don't quite understand what the names_to = c(".value", "group") phrase does.

In any event, any help is appreciated.

Thanks

JFD
  • 119
  • 6

2 Answers2

1

You were on the right path. Renaming is needed since only the name columns do not have any suffix to identify them. .value identifies part of the original column name that you want to uniquely identify as new columns. If you remove everything until the last underscore the part that remains are the new column names which you can specify using regex in names_pattern.

library(dplyr)
library(tidyr)

df %>%  
  rename(contact_1_name=contact_1, 
         contact_2_name=contact_2) %>%
  pivot_longer(cols = everything(), 
               names_to = '.value', 
               names_pattern = '.*_(\\w+)')

#  prefix name              loc        
#  <chr>  <chr>             <chr>      
#1 Mr.    Bob Johnson       Earth      
#2 Dr.    Tommy Two Tones   London     
#3 Mrs.   Robert Johnson    New York   
#4 Mr.    Tommy Three Tones Geneva     
#5 Dr.    Bobby Johnson     Los Angeles
#6 Mrs.   Tommy No Tones    Paris      
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
0

Here is a solution using split.default

data.table::rbindlist( 
  lapply( split.default( df, gsub( "[^0-9]+", "", names(df) ) ),
          data.table::setnames, 
          new = c("prefix", "name", " loc" ) ) )
#   prefix              name         loc
# 1:    Mr.       Bob Johnson       Earth
# 2:   Mrs.    Robert Johnson    New York
# 3:    Dr.     Bobby Johnson Los Angeles
# 4:    Dr.   Tommy Two Tones      London
# 5:    Mr. Tommy Three Tones      Geneva
# 6:   Mrs.    Tommy No Tones       Paris
    
Wimpel
  • 16,956
  • 1
  • 15
  • 34