0

I have to a column in R which has uneven distribution like an exponential distribution. I want to normalize the data and then bin the data in subsequent buckets.

Saw following links which helps in normalizing the data but nothing with binning the data to different categories.

Normalizing data in R

Standardize data columns in R

Example: of how eneven distributed column would look like but with lot of rows.

dat <- data.frame(Id = c(1,2,3,4,5,6,7,8),
                  Qty = c(1,1,1,2,3,13,30,45))

I want it binned the column in 5 categories which may look like:

dat <- data.frame(Id = c(1,2,3,4,5,6,7,8),
                      Qty = c(1,1,1,2,3,13,30,45),
                      Binned_Category = c(1,1,1,1,2,3,4,5))

Above binned_Category is sample, the values may not look like this for the given data in real world. I just wanted to showcase how I want the output to look like.

Rahul Agarwal
  • 3,743
  • 6
  • 24
  • 40
  • What would be your binning criteria? Would that involve cutoff ranges? I am finding it difficult to understand how you arrive at Binned_Category column. – M_Shimal May 28 '18 at 16:30
  • `cut()` is the thing you are looking for – abhiieor May 28 '18 at 16:31
  • @M_Shimal: I am assuming after normazling the column, my values will lie between 0-1 and then divding those values in 5 equal bins – Rahul Agarwal May 28 '18 at 16:32
  • 1
    As @abhiieor pointed out, I think cut() is what you are looking for. Check this out https://stackoverflow.com/questions/40794821/error-when-binning-data-using-cut-in-r – M_Shimal May 28 '18 at 16:36
  • @abhiieor: Can you provide complete answer. How to first convert an exponential distribution to 0-1 and then "cut" it into bins – Rahul Agarwal May 28 '18 at 16:51

1 Answers1

1

This will help:

num_bins <- 5
findInterval(Qty, unique(quantile(Qty, prob = seq(0, 1, 1/num_bins))))
abhiieor
  • 1,870
  • 3
  • 20
  • 41