1

I have a data frame that lists down some names of individuals and their monetary transactions carried out in USD. The table lists down data according to several districts and the valid transactions made by either cash or credit cards, like so:

X    Dist    transact.cash    transact.card
a    1       USD              USD
b    1       USD              USD

Where X is an individual and his/her transactions for a period of time keeping that period fixed and Dist is the district where he/she resides. There are over 4000 observations in total for an approx. 80-100 rows per Dist. So far, the sorting, slicing and everything else have been simple operations with dat.cash and dat.card being subsetted tables according to mode of transaction; but I'm having problems when extracting information in reference to ranking the dataset. For this, I have written a function where I specify a rank and the function should show those rows starting from that rank:

rankdat <- function(transact, numb) {
               # Truncated
                 valid.nums = c('highest', 'lowest', 1:nrow(dat.cash)) # for cash subset
                     if (transact == 'cash' && numb == 'highest') { # This is easy
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T), ]# For sorting only cash data set
                  } else if (transact == 'cash' and numb == 1:nrow(dat.cash)) { 
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T) == numb, ] } # Not getting results here
                 }

The last line is returning NULL instead of a ranked transaction and all its rows. Replacing == with %in% still gives NULL and using rank() doesn't change anything. For highest and lowest numbers, its not a great deal since it only involves simple sorting. If I specify rankdat('cash', 10), the function should return values starting from the 10th highest transaction and decreasing irrespective of Dist, similar to:

 X    Dist    transact.cash
 b    1       10th highest
 h    2       11th highest
 p    1       12th highest
 and  so      on
shiv_90
  • 873
  • 12
  • 27
  • Do you want to do that with R or python ? – B.Gees Jun 05 '17 at 12:07
  • This is in r, edited title. – shiv_90 Jun 05 '17 at 14:23
  • Please change your line code :} else if (transact == 'cash' and numb = 1:nrow(dat.cash) { into } else if (transact == 'cash' and numb == 1:nrow(dat.cash) ){ – B.Gees Jun 05 '17 at 14:25
  • Thanks for the pointer. Corrected. – shiv_90 Jun 05 '17 at 14:28
  • I'm so sorry but you have a mistake again else if ( **)** – B.Gees Jun 05 '17 at 14:30
  • Apologies, corrected. – shiv_90 Jun 05 '17 at 14:35
  • I don't understand the main goal of your function. Do you want to extract row after sort them ? – B.Gees Jun 05 '17 at 14:49
  • Precisely yes, I'm trying to extract rows based upon the ranks of transaction variables after the data has been sorted. That is, if I specify 10 in `numb` as rank, then my function should return data starting from the 10th highest ranking transaction from any of the subsetted data frames. Hope its clear now. – shiv_90 Jun 05 '17 at 15:10
  • To be sure, at the beginning you have only one dataframe `dat.cash`which contains `transact.cash`and transact.card` columns ? – B.Gees Jun 05 '17 at 15:15
  • No thats the main data frame, call it as `dat` and has both the columns. I subsetted those columns to form separate tables `dat.cash` and `dat.table`. The example function I showed is for only `dat.cash`. I'll edit those lines and add more code for more clarity. – shiv_90 Jun 05 '17 at 15:20

2 Answers2

1

Suppose that you have the following data.frame:

df=data.frame(X=c(rep('A',2),rep('B',3),rep('A',3),rep('B',2)),
               Dist=c(rep(1,5),rep(0,5)),
               transact.cash=c(rep('USD',5),rep('€',5)),
               transact.card=c(rep('USD',5),rep('€',5)))

We obtain:

   X Dist transact.cash transact.card
1  A    1           USD           USD
2  A    1           USD           USD
3  B    1           USD           USD
4  B    1           USD           USD
5  B    1           USD           USD
6  A    0             €             €
7  A    0             €             €
8  A    0             €             €
9  B    0             €             €
10 B    0             €             €

If you would like to sort a dataframe with multiple columns transact.cash or transact.cash you can used stackoverflow : How to sort a dataframe by column(s). In your example, you only specified dat.cash, thus :

sort = df[order(df$transact.cash, decreasing=T),] # Order your dataFrame with transact.cash column 

If you want to extract rows which respect a specific statement, you need to use which() and == for numeric, double, logical match or %in% for string match. For example:

XA = df[which(df$X %in% "A"),] # Select row by user
XDist = df[which(df$Dist == 1),] # Select row by District

Finally, if you would like to select the first five row after ordering:

sort[1:5,] # Select first five rows
sort[1:numb,] # Select first numb rows

With that you can perform a simple function to easily extract data from your dataframe.

Hope it will help you

B.Gees
  • 1,035
  • 9
  • 26
1

This function is able to do that:

rankdat <- function(df,rank.by,num=10,method="top",decreasing=T){
  # ------------------------------------------------------
  # RANKDAT
  # ------------------------------------------------------
  # ARGUMENT 
  # ========
  # df        Input dataFrame [d.f]
  # num       Selected row [num]
  # rank.by   Name of column(s) used to rank dataFrame
  # method    Method used to extract rows
  #             top - to select top rank (e.g. 10 first rows)
  #             specific - to select specific row
  # ------------------------------------------------------
  eval(parse(text=paste("sort=df[with(df,order(",rank.by,"), decreasing=",decreasing,"),]",sep=""))) # order dataFrame by 
  if(method %in% "top"){
    return(sort[1:num,])
  }else if(method %in% "specific"){
    return(sort[num,])
  }else{
    stop("Please select method used to extract data !!!")
  }
}
Pang
  • 8,605
  • 144
  • 77
  • 113
B.Gees
  • 1,035
  • 9
  • 26