0

Question

It has been asked a few times already how to generate a sorted frequency table of a categorical variable in R (see, for instance, this question which is marked a duplicate of a generic data frame sorting question). The answers suggest three successive operations: 1. generate frequency table, 2. transform to data frame, 3. sort (see example below).

This is relatively complicated for such a simple operation. What is more, summary() of the data frame will give you (for the column in question) the first 5 lines of exactly the sorted frequency table I am looking for (provided that the number of different values in this column is larger than 5).

Example

Consider a data frame of the form

example_df <- data.frame("Avg" = c(2558,2532,2503,2498,2491,2491,2477,2467,2460,2458,2445,2422), "Name" = c("Jun","Wang","Xi","Wang","Wang","Ma","Li","Ma","Xi","Lin","Yang","Zhao"))

but much longer, with several thousand rows and several thousand different values for "Name".

What is the easiest way to extract a frequency table of names with the most common names coming first? This would be useful since a large table with mostly Names occurring just once would not be very informative.

You can do

example_ft<-as.data.frame(table(example_df$Name))
example_ft<-example_ft[order(-example_ft$Freq),]

or

library(plyr)
example_ft<-as.data.frame(table(example_df$Name))
example_ft<-arrange(example_ft,desc(Freq),Var1)

These are the solutions suggested in the previous questions linked above. Both result in the following example_ft just as intended (though the row numbers differ)

  Var1 Freq
5 Wang    3
4   Ma    2
6   Xi    2
1  Jun    1
2   Li    1
3  Lin    1
7 Yang    1
8 Zhao    1

but both options seem rather complicated. My guess is that there is probably a more simple and straightforward way. And indeed there is a very simple command that will give the desired output (but only the first 5 lines and only among other unrelated output), summary():

summary(example_df)

Output:

      Avg            Name  
 Min.   :2422   Wang   :3  
 1st Qu.:2460   Ma     :2  
 Median :2484   Xi     :2  
 Mean   :2484   Jun    :1  
 3rd Qu.:2499   Li     :1  
 Max.   :2558   Lin    :1  
                (Other):2 
Community
  • 1
  • 1
0range
  • 1,758
  • 1
  • 17
  • 29

1 Answers1

1

How about this? :

sort(table(example_df$Name),decreasing = TRUE)
David Heckmann
  • 2,679
  • 2
  • 17
  • 27