How can I use the row.names attribute to order the rows of my dataframe in R?

Question

I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:

row.names   class  
564028      1
275747      1
601137      0
922930      1
481988      1
...

The row.names attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.

Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.

Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.

The documentation implores me to:

use attr(x, "row.names") if you need to retrieve an integer-valued set of row names.

but this leaves me with nothing but NULL.

My question is, how can I use row.names which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?

There is a subtlety to this question in that there is a column named 'row.names' which is NOT the same as the attribute named 'row.names'. The downvote I just got (18 months after my reply) might or might not be appropriate. I suppose other readers can throw in their opinions. — IRTFM, Aug 08 '14 at 00:16
Yes, it's not really clear, as the text does specifically say "according to the row.names _attribute_", and the accepted answer operates on the `row.names` attribute. In which case, I think @ToNoY's answer is the right one. (It worked correctly for me.) — big_m, Feb 08 '16 at 04:29

score 29 · Answer 1 · edited Feb 06 '20 at 03:33

29

None of the other solutions would actually work.

It should be:

# Assuming the data frame is called df
df[ order(as.numeric(row.names(df))), ]

because the row name in R is character, when the as.numeric part is missing it, it will arrange the data as 1, 10, 11, ... and so on.

edited Feb 06 '20 at 03:33

Eric Leung

1,650
10
21

answered Jul 12 '15 at 14:39

ToNoY

1,218
2
22
40

score 27 · Accepted Answer · answered Jan 29 '15 at 18:21

27

This worked for me:

new_df <- df[ order(row.names(df)), ]

answered Jan 29 '15 at 18:21

cburghard

494
1
5
10

2

People looking for the same thing, check out ToNoY's answer below. It will save you time when you find out it orders it wrong – Claud H Aug 30 '17 at 09:16

score 2 · Answer 3 · edited May 23 '17 at 12:09

For completeness:

@BondedDust's answer works perfectly for the rownames attribute, but your example does not use the rownames attribute. The output provided in your question indicates use of a column named "row.names", which isn't the same thing (all listed in @BondedDust's comment). Here would be the answer if you wished to sort by the "row.names" column in example given in your question (there is another posting on this, located here). This answer assumes you are using a dataframe named "df", with one column named "row.names":

ordered.df <- df[order(df$row.names),]   #this orders the df by the "row.names" column

Alternatively, to order by the first column (same thing if you're still using your example):

ordered.df <- df[order(df[,1]),]         #this orders the df by the first column

Hope this is helpful!

score 2 · Answer 4 · answered Apr 05 '18 at 21:17

2

If you have only one column in your dataframe like in my case you have to add drop=F:

df[ order(rownames(df)) , ,drop=F]

answered Apr 05 '18 at 21:17

forever

89
1
1
8

score 1 · Answer 5 · answered Nov 30 '13 at 02:33

1

This will be done almost automatically since the "[" function will display in lexical order of any vector that can be matched to rownames():

df[ rownames(df) , ]

You might have thought it would be necessary to use:

df[ order(rownames(df)) , ]

But that would have given you an ordering of 1:100 of 1,10,100, 12,13, ...,2,20,21, ... , because the argument to "[" gets coerced to character.

answered Nov 30 '13 at 02:33

IRTFM

240,863
19
328
451

Two issues: First, I believe the function for data frames is `row.names` (although `rownames` does seem to work, probably to save everyone's sanity). Second, I just tried your suggestion and, without the `order` part, the rows were simply spit out in the order they are already in — no reordering took place. @ToNoY's tip to convert to numeric worked for me, though. – big_m Feb 08 '16 at 04:34
`rownames` work with any object of 2 or more dimensions. Furthermore, the example above is confusing since the `row.names` attribute never is labeled by the print function on the same row as column names. Agree that @ToNoY's answer is the best. – IRTFM Feb 08 '16 at 04:38

score 0 · Answer 6 · answered Nov 04 '14 at 19:59

Assuming your data frame is named 'df'you can create a new ordered data frame 'ord.df' that will contain the row names of df as well as it values in the following one line of code:

>ord.df<-cbind(rownames(df)[order(rownames(df))], df[order(rownames(df)),])

score 0 · Answer 7 · edited Mar 06 '18 at 14:03

0

new_df <- df[ order(row.names(df)), ]

or something similar won't work. After this statement, the new_df does not have a rowname any more. I guess a better solution is to add a column as rowname, sort by it, and set it as the rowname

edited Mar 06 '18 at 14:03

Marco Sandri

20,151
7
37
47

answered Mar 05 '18 at 17:52

user9447252

1

>df$rownames df #now delete the column >df$rownames – user9447252 Mar 05 '18 at 18:03

score 0 · Answer 8 · answered Feb 05 '19 at 20:18

0

you can simply sort your df by using this :

df <- df[sort(rownames(df)),]

and then do what you want !

answered Feb 05 '19 at 20:18

Erfan Mahmoudinia

21
4

How can I use the row.names attribute to order the rows of my dataframe in R?

8 Answers8

Linked

Related