25

I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:

row.names   class  
564028      1
275747      1
601137      0
922930      1
481988      1
...

The row.names attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.

Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.

Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.

The documentation implores me to:

use attr(x, "row.names") if you need to retrieve an integer-valued set of row names.

but this leaves me with nothing but NULL.

My question is, how can I use row.names which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?

tumultous_rooster
  • 10,446
  • 27
  • 81
  • 140
  • There is a subtlety to this question in that there is a column named 'row.names' which is NOT the same as the attribute named 'row.names'. The downvote I just got (18 months after my reply) might or might not be appropriate. I suppose other readers can throw in their opinions. – IRTFM Aug 08 '14 at 00:16
  • Yes, it's not really clear, as the text does specifically say "according to the row.names _attribute_", and the accepted answer operates on the `row.names` attribute. In which case, I think @ToNoY's answer is the right one. (It worked correctly for me.) – big_m Feb 08 '16 at 04:29

8 Answers8

29

None of the other solutions would actually work.

It should be:

# Assuming the data frame is called df
df[ order(as.numeric(row.names(df))), ]

because the row name in R is character, when the as.numeric part is missing it, it will arrange the data as 1, 10, 11, ... and so on.

Eric Leung
  • 1,650
  • 10
  • 21
ToNoY
  • 1,218
  • 2
  • 22
  • 40
27

This worked for me:

new_df <- df[ order(row.names(df)), ]
cburghard
  • 494
  • 1
  • 5
  • 10
  • 2
    People looking for the same thing, check out ToNoY's answer below. It will save you time when you find out it orders it wrong – Claud H Aug 30 '17 at 09:16
2

For completeness:

@BondedDust's answer works perfectly for the rownames attribute, but your example does not use the rownames attribute. The output provided in your question indicates use of a column named "row.names", which isn't the same thing (all listed in @BondedDust's comment). Here would be the answer if you wished to sort by the "row.names" column in example given in your question (there is another posting on this, located here). This answer assumes you are using a dataframe named "df", with one column named "row.names":

ordered.df <- df[order(df$row.names),]   #this orders the df by the "row.names" column

Alternatively, to order by the first column (same thing if you're still using your example):

ordered.df <- df[order(df[,1]),]         #this orders the df by the first column

Hope this is helpful!

Community
  • 1
  • 1
mflo-ByeSE
  • 191
  • 1
  • 7
2

If you have only one column in your dataframe like in my case you have to add drop=F:

df[ order(rownames(df)) , ,drop=F]
forever
  • 89
  • 1
  • 1
  • 8
1

This will be done almost automatically since the "[" function will display in lexical order of any vector that can be matched to rownames():

df[ rownames(df) , ]

You might have thought it would be necessary to use:

df[ order(rownames(df)) , ]

But that would have given you an ordering of 1:100 of 1,10,100, 12,13, ...,2,20,21, ... , because the argument to "[" gets coerced to character.

IRTFM
  • 240,863
  • 19
  • 328
  • 451
  • Two issues: First, I believe the function for data frames is `row.names` (although `rownames` does seem to work, probably to save everyone's sanity). Second, I just tried your suggestion and, without the `order` part, the rows were simply spit out in the order they are already in — no reordering took place. @ToNoY's tip to convert to numeric worked for me, though. – big_m Feb 08 '16 at 04:34
  • `rownames` work with any object of 2 or more dimensions. Furthermore, the example above is confusing since the `row.names` attribute never is labeled by the print function on the same row as column names. Agree that @ToNoY's answer is the best. – IRTFM Feb 08 '16 at 04:38
0

Assuming your data frame is named 'df'you can create a new ordered data frame 'ord.df' that will contain the row names of df as well as it values in the following one line of code:

>ord.df<-cbind(rownames(df)[order(rownames(df))], df[order(rownames(df)),])
0
new_df <- df[ order(row.names(df)), ]  

or something similar won't work. After this statement, the new_df does not have a rowname any more. I guess a better solution is to add a column as rowname, sort by it, and set it as the rowname

Marco Sandri
  • 20,151
  • 7
  • 37
  • 47
0

you can simply sort your df by using this :

df <- df[sort(rownames(df)),]

and then do what you want !