0

I am looking to convert the given below loop into lapply. I'm very new to lapply concepts not 100 % sure how i can convert the loop into lapply

Right now with the current loop, R studio is aborting the session due to excess memory usage. I am already using 10GB Ram.

To provide sume statitics on resultset, for each loop, SQL fetches around 200k records in the memory. Based on the example, 2k *200k will be result set in the loop and hence it is getting crashed

for (cSegNbr in 1:nrow(segment_df)){
  segnbr <- segment_df[cSegNbr, "segment_nbr"]
  slsdecile <- segment_df[cSegNbr, "sales_decile"]
  promodecile <- segment_df[cSegNbr, "sm_rate_decile"]
  BPCdecile <- segment_df[cSegNbr, "bpc_sales_decile"]
  HCdecile <- segment_df[cSegNbr, "hc_sales_decile"]
  Otherdecile <- segment_df[cSegNbr, "other_sales_decile"]

  # print(paste('select * from BUSINESS_USERS.cp_cntrl_pool_cohort_final_new_nl 
  # where cohort_month=201908  and segment_nbr=', segnbr,' and 
  # other_sales_decile=',Otherdecile,' and hc_sales_decile=',HCdecile,' and 
  # bpc_sales_decile=',BPCdecile,' and sales_decile=',slsdecile, ' and 
  # sm_rate_decile=',promodecile))

seg_df <- paste('select * from BUSINESS_USERS.cp_cntrl_pool_cohort_final_new_nl 
where cohort_month=',year_list[1],' and segment_nbr=', segnbr,' and 
other_sales_decile=',Otherdecile,' and hc_sales_decile=',HCdecile,' and 
bpc_sales_decile=',BPCdecile,' and sales_decile=',slsdecile, ' and 
sm_rate_decile=',promodecile)

  seg_Nbr_final<-gsub("[\r\n]", "", seg_df)
  query_seg <- dbSendQuery(IDW2, seg_Nbr_final)
  df<-dbFetch(query_seg )
  m.out= matchit(test_control ~ 
q1_sls+q2_sls+q2_sls+q4_sls+sls_nbr_of_wks+fs_sales_amt_m1+fs_sales_amt_m2+fs_sa
les_amt_m3+fs_sales_amt_m4+fs_sales_amt_m5+fs_sales_amt_m6+fs_sales_amt_m7+fs_sa
les_amt_m8+fs_sales_amt_m9+fs_sales_amt_m10+fs_sales_amt_m11+fs_sales_amt_m12+fs
_sales+avg_bskt_size+promo_depth+SB_penetration+personalized_cpn+base_cpn+mass_c
pn+phr_flag+bty_club_flag+card_tenure_years+cust_age,data=df, method="nearest", 
ratio=1)

  m.data1<-match.data(m.out)
   df_total <- rbind(df_total,m.data1)

  # print(paste('select * from BUSINESS_USERS.cp_cntrl_pool_cohort_final_new_nl
  #  where cohort_month=201908  and segment_nbr=', segnbr,' and 
  #  sales_decile=',slsdecile, ' and sm_rate_decile=',promodecile))
Limey
  • 4,618
  • 2
  • 5
  • 21
Raghav
  • 1
  • Using `lapply` will not necessarily fix your problem. If you want to try convert the loop code to a function for one row of data. Test it to make sure it works properly. You will leave out the last line of the loop with `rbind`. Then use something like `df_total – dcarlson May 07 '21 at 13:15
  • 1
    FYI: (1) I discourage the use of `paste` to construct SQL queries, it can lead to inadvertent sql injection mistakes; see https://cran.r-project.org/web/packages/DBI/vignettes/DBI-advanced.html and https://db.rstudio.com/best-practices/run-queries-safely/. (2) You can shorten your calls from `dbSendQuery(.);dbFetch(.)` (which is missing a `dbClearResult(.)`, btw) into a single `dbGetQuery(con,qry,params)`. (3) Your code is incomplete, perhaps just missing a closing-`}`. – r2evans May 07 '21 at 13:21
  • (4) The biggest (only?) benefit of using `lapply` here is that you avoid iteratively `rbind`ing to your frame, which is inefficient and scales very poorly; it's best to form a list of frames and then when complete, `rbind` them together, using `do.call(rbind, df_total)` or `dplyr::bind_rows(df_total)` or `data.table::rbindlist(df_total)`. See https://stackoverflow.com/a/24376207/3358227. – r2evans May 07 '21 at 13:21

0 Answers0