-1

Given

x<-data.frame(age=sample(1:10), gender=c("M","F"))

How do I select the rows with gender 'M', and order by age ?

I know I can order the dataframe by age with : x[order(x$age),]

And I can select the rows with gender 'M' with: x[x$gender=='M',]

And finally, I can do both with: y<-x[x$gender=='M',] y<-y[order(y$age),]

Is there a way to more concisely do this ?

I've tried

x[x$gender=='M' & order(x$age),]
and 
x[x$gender=='M' & order(x$age),]

which filters but does not order.

> x[x$gender=='M' &order(x$age),]
  age gender
1   4      M
3   6      M
5   3      M
7   1      M
9   9      M

I've also tried

x[x$gender=='M' &&order(x$age),]

Which doesn't seem to filter or order (I confess I don't understand the difference between & and &&).

   age gender

1    4      M
2    5      F
3    6      M
4    2      F
5    3      M
6   10      F
7    1      M
8    8      F
9    9      M
10   7      F

What am I doing wrong ?

Note: My question is slightly different from the one asked here; I'm specifically trying to understand why my approach(es) don't work. That question asks for an answer but neither it nor its answers help to explain what was wrong with the specific approaches tried here.

Community
  • 1
  • 1
tmark
  • 33
  • 4
  • Possible duplicate of [Sort data frame column by factor](http://stackoverflow.com/questions/21297989/sort-data-frame-column-by-factor) – Matias Andina Dec 04 '15 at 21:16

3 Answers3

3

try this with dplyr:

library(dplyr)
library(magrittr)
x<-data.frame(age=sample(1:10), gender=c("M","F"))

x %>%
  filter(gender=="M") %>%
  arrange(age)

Here is the output:

  age gender
1   1      M
2   2      M
3   3      M
4   7      M
5   9      M

or in descending order:

x %>%
  filter(gender=="M") %>%
  arrange(desc(age))

Here is the output for this:

  age gender
1   9      M
2   7      M
3   3      M
4   2      M
5   1      M
jasdumas
  • 41
  • 2
  • 5
  • Thank you Jasmine. I'll have to look at dplyr - it looks nifty. I'll also need to look at magrittr - but the "%>%" syntax looks downright baffling. – tmark Dec 08 '15 at 02:32
2

order(x$age) is returning a vector with indexes:

order(x$age)
[1]  8  1  4  6  2  3  9  7 10  5

x$gender=='M' returns True/False (1/0) based on that conditional:

x$gender=='M'
[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

x$gender=='M' & order(x$age) is a logical comparison in which order(x$age) is treated as all true (as they are not 0), resulting in the same true/false as order(x$age).

One solution would be x[x$gender=='M',][order(x$age[x$gender=='M']),] where you take the subset where gender = M, then use the order of this subset to order the result.

Jaap
  • 71,900
  • 30
  • 164
  • 175
user5219763
  • 1,174
  • 11
  • 18
  • Thanks. I think part of my mistake was assuming that a bitwise AND would be performed between the (binary) representation of x$age and the logical vector x$gender == 'M', and that the resulting value would be 0 or not 0 which would then be evaluated to a logical value. I see now that is wrong, but my confusion probably goes deeper than that :-) – tmark Dec 08 '15 at 03:03
-1

Either your first method x[x$gender=='M' & order(x$age),] or the second x[x$gender=='M' &&order(x$age),] won't work in this situation because you misuse the logic operators.

In term of the logic operators, please refer https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html.

To solve this problem, I think the way you said is good enough. Sort first and then select the "Male" or select "Male" first and then sort. If you really want to write the code in one line you can do x[x$gender=="M",][order(x[x$gender=="M",]$age),], which is very ugly.

Other metrics like using dplyr is also helpful. (But it seems too fancy for me to be used here).

Jonas
  • 106
  • 1
  • 6