1

I am still having problems with using pivot_longer to gather wide multiple columns with into multiple long columns rather than column by column (follow-up to Gathering wide columns into multiple long columns using pivot_longer).

For example, the columns hf_1, hf_2, hf_3, hf_4, hf_5, hf_6 need to be pivoted into 2 columns (hf_com - this column with values 1,2,3,4,5,6 from wide hf columns) and (hf_com_freq - this column with value 1).

The same needs to occur for the columns ac_1, ac_2, ac_3, ac_4, ac_5, ac_6. These columns need to be pivoted into 2 columns (ac_com - this column with values 1,2,3,4,5,6 from wide ac columns) and (ac_com_freq - this column with value 1).

I have tried looking at:

Gather multiple sets of columns

and:

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

and the example from who dataset in:

https://tidyr.tidyverse.org/articles/pivot.html

but I can't get the values into the multiple longer columns that I need.

Here is the input data from a toy dataset:

df1 <- tribble(
  ~"np_id", ~"np_city_size", ~"cc_hf_1", ~"cc_hf_2", ~"cc_hf_3", ~"cc_hf_4", ~"cc_hf_5", ~"cc_hf_6", ~"cc_ac_1", ~"cc_ac_2", ~"cc_ac_3", ~"cc_ac_4", ~"cc_ac_5", ~"cc_ac_6",
  "81", "village", NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA,
  "82", "village", 1L, NA, NA, NA, 1L, NA, NA, NA, NA, 1L, NA, NA,
  "83", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
  "85", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
  "87", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA,
  "89", "village", 1L, NA, NA, 1L, NA, NA, 1L, NA, NA, NA, NA, NA,
  "90", "village", 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA,
  "91", "village", 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA,
  "92", "village", NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L
)

This is the code I have at the moment:

df_longer <- df1 %>% pivot_longer(
  cols = -(starts_with("np_")),
  names_to = c("hf_com", "ac_com"), 
  names_pattern = "cc_?(.*)_(.*)",
  values_to = c("hf_com_freq", "ac_com_freq")
)

However, I know that I need to extract the last character in the column heading (eg. 1 from hf_1 2 from hf_2 ) and pass it as the .value to each column but I having problems using regular expressions and pivot_longer parameters such as names_patterns to solve this. I feel that I am very close to the solution but can't see the forest for the trees!!!!

Here are the actual results:

df_longer <- structure(list(np_id = c("81", "81", "81", "81", "81", "81"), 
    np_city_size = c("village", "village", "village", "village", 
    "village", "village"), hf_com = c("hf", "hf", "hf", "hf", 
    "hf", "hf"), ac_com = c("1", "2", "3", "4", "5", "6"), hf_com_freq = c(NA, 
    NA, 1L, NA, NA, NA), ac_com_freq = c(NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Here are the expected results:

df_longer <- structure(list(np_id = c("81", "81", "81", "81", "81", "81"), 
    np_city_size = c("village", "village", "village", "village", 
    "village", "village"), hf_com = c("1", "2", "3", "4", 
    "5", "6"), ac_com = c("1", "2", "3", "4", "5", "6"), hf_com_freq = c(NA, 
    NA, 1L, NA, NA, NA), ac_com_freq = c(NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
Z.Lin
  • 23,077
  • 5
  • 35
  • 71
EJG_27
  • 101
  • 9

0 Answers0