0

I have a csv data file with 50000+ records stored in dataframe 'data'. I am creating data subsets based on 2 factors Segment & Market with below values:

customer_segments <- c('Consumer','Corporate','Home Office')
markets <- c('Africa','APAC','Canada','EMEA','EU','LATAM','US')

To get all subsets with 21 combinations for Market & Segement, I am using below nested for loops with assign & paste functions:

for(i in 1:length(markets)){
  for(j in 1:length(customer_segments)){
assign(paste(markets[i],customer_segments[j],sep='_'),data[(data$Market == markets[i]) & (data$Segment == customer_segments[j]), ])
  }
}

This creates 21 dataframes & assign them a name accordingly like Canada_Home Office etc. Problem is I want to iterate over all these 21 dataframes to aggregate 3 attributes: Sales, Quantity & Profit on each but not sure how to address these dataframes in a loop? Maybe if I get all 21 dataframes in a vector I can iterate, but not sure if this is the best option.

Mohit
  • 65
  • 1
  • 7
  • Above code creates 21 data subsets with names like market_segment in global environment – Mohit Oct 20 '17 at 17:17
  • @Mohit Code for creating vector of those 21 data frames is posted below. – Sowmya S. Manian Oct 20 '17 at 17:22
  • 1
    Suggested duplicate: [how do I make a list of data frames](https://stackoverflow.com/a/24376207/903061). My answer there describes how to pull data frames from the global environment into a list so you can iterate over them. – Gregor Thomas Oct 20 '17 at 17:22
  • @Gregor Yes even thats good. – Sowmya S. Manian Oct 20 '17 at 17:23
  • Skip your for loops and go directly to a list: `dat_list = split(data, by = interaction(data$Market, data$customer_segments, sep = "_"))`. Subset your data first if that is not all values for those two factors. – Gregor Thomas Oct 20 '17 at 17:32
  • @Gregor Thanks, but I am getting Error in deparse(...) : using your solution, My code is: dat_list = split(data, by = interaction(data$Market, data$Segment, sep = "_")) – Mohit Oct 20 '17 at 17:45
  • Sorry, the argument to `split` isn't called by. Just do `dat_list = split(data, interaction(data$Market, data$customer_segments, sep = "_"))`, or change `by = ` to `f = ` – Gregor Thomas Oct 20 '17 at 19:08
  • Thanks @Gregor, awesome! your one line solution solved my problem. – Mohit Oct 25 '17 at 14:27

1 Answers1

0

Create combination of markets and customer_segments using expand.grid().

  df <- expand.grid(markets, customer_segments)
  head(df)
  #      Var1        Var2
  # 1  Africa    Consumer
  # 2    APAC    Consumer
  # 3  Canada    Consumer
  # 4    EMEA    Consumer
  # 5      EU    Consumer
  # 6   LATAM    Consumer

Vector of the combination of markets and customer_segments

  df1 <- as.vector(paste(df$Var1,df$Var2, sep = " "))
  df1
  # [1] "Africa Consumer"    "APAC Consumer"      "Canada Consumer"   
  # [4] "EMEA Consumer"      "EU Consumer"        "LATAM Consumer"    
  # [7] "US Consumer"        "Africa Corporate"   "APAC Corporate"    
  # [10] "Canada Corporate"   "EMEA Corporate"     "EU Corporate"      
  # [13] "LATAM Corporate"    "US Corporate"       "Africa Home Office"
  # [16] "APAC Home Office"   "Canada Home Office" "EMEA Home Office"  
  # [19] "EU Home Office"     "LATAM Home Office"  "US Home Office" 
Sowmya S. Manian
  • 3,321
  • 2
  • 14
  • 25
  • Hey Sowmya, Thanks for reply but your solution is creating a vector of strings. I have already created global varibles with above names in mentioned code, now I want to create a vector of those 21 dataframes – Mohit Oct 20 '17 at 17:26
  • Oh got it. So vector of dataframes. Hmm..you can go for listing of those data frames. Let me see how can I help – Sowmya S. Manian Oct 20 '17 at 17:27