3

I've a data like this in a text file

fd50c4007b68a3737fe052d5a4f78ce8aa117f3d    SOEGIYH12A6D4FC0E3  1
fd50c4007b68a3737fe052d5a4f78ce8aa117f3d    SOFLJQZ12A6D4FADA6  1
fd50c4007b68a3737fe052d5a4f78ce8aa117f3d    SOHTKMO12AB01843B0  1
fd50c4007b68a3737fe052d5a4f78ce8aa117f3d    SODQZCY12A6D4F9D11  1
fd50c4007b68a3737fe052d5a4f78ce8aa117f3d    SOXLOQG12AF72A2D55  1
d7083f5e1d50c264277d624340edaaf3dc16095b    SOUVUHC12A67020E3B  1
d7083f5e1d50c264277d624340edaaf3dc16095b    SOUQERE12A58A75633  1
d7083f5e1d50c264277d624340edaaf3dc16095b    SOIPJAX12A8C141A2D  1
d7083f5e1d50c264277d624340edaaf3dc16095b    SOEFCDJ12AB0185FA0  2
d7083f5e1d50c264277d624340edaaf3dc16095b    SOATCSU12A8C13393A  2

Which I am successfully able to keep in a variable but:

  1. I need to sort this data with respect to third field.
  2. I need to sort the data with respect to first field and group it with respect to same 1st field and want to sum t he 3rd field in a group.

Is it possible to do with R language?

The output should be:

fd50c4007b68a3737fe052d5a4f78ce8aa117f3d 5
d7083f5e1d50c264277d624340edaaf3dc16095b 7
Shadow The Vaccinated Wizard
  • 62,584
  • 26
  • 129
  • 194

2 Answers2

3

As you (sort of) state in your question you have two problems:

  1. Calculate the sum of a variable conditional on another variable
  2. Sorting a data

The first problem can be solved using the plyr package:

##Some dummy data
library(plyr)
dd = data.frame(V1 = rep(c("A", "A", "B"), 4), V2 = rep(1:3,each=2 ))

##The function ddply takes in a data frame dd
##Splits the data frame by column V1
##Sums the column V2
dd1 = ddply(dd, "V1", summarise,  V2 = sum(V2))

The second problem can be solved by searching for "how to sort a data frame"

dd1[with(dd1, order(V2)), ]
Community
  • 1
  • 1
csgillespie
  • 54,386
  • 13
  • 138
  • 175
3

Q1: Sorting a dataframe by one column is generally done with order. You do need to name the dataframe within order, which may seem superfluous to a new useR. But the numeric indexing is highly flexible and numeric vectors of various constructions can also produce useful results, so the requirement for a specific vector object is needed.

> dat[ order(dat$V1), ]
                                         V1                 V2 V3
6  d7083f5e1d50c264277d624340edaaf3dc16095b SOUVUHC12A67020E3B  1
7  d7083f5e1d50c264277d624340edaaf3dc16095b SOUQERE12A58A75633  1
8  d7083f5e1d50c264277d624340edaaf3dc16095b SOIPJAX12A8C141A2D  1
9  d7083f5e1d50c264277d624340edaaf3dc16095b SOEFCDJ12AB0185FA0  2
10 d7083f5e1d50c264277d624340edaaf3dc16095b SOATCSU12A8C13393A  2
1  fd50c4007b68a3737fe052d5a4f78ce8aa117f3d SOEGIYH12A6D4FC0E3  1
2  fd50c4007b68a3737fe052d5a4f78ce8aa117f3d SOFLJQZ12A6D4FADA6  1
3  fd50c4007b68a3737fe052d5a4f78ce8aa117f3d SOHTKMO12AB01843B0  1
4  fd50c4007b68a3737fe052d5a4f78ce8aa117f3d SODQZCY12A6D4F9D11  1
5  fd50c4007b68a3737fe052d5a4f78ce8aa117f3d SOXLOQG12AF72A2D55  1

Q2: To sum a vector within categories and return a dataframe, use aggregate:

> with(dat , aggregate(V3 ~ V1) )
                                        V1 V3
6 d7083f5e1d50c264277d624340edaaf3dc16095b  7
1 fd50c4007b68a3737fe052d5a4f78ce8aa117f3d  5

If it needs to be ordered:

> dat2 <- with(dat , aggregate(V3 ~ V1) )
> dat2[order(dat2$V1), ]
                                        V1 V3
6 d7083f5e1d50c264277d624340edaaf3dc16095b  7
1 fd50c4007b68a3737fe052d5a4f78ce8aa117f3d  5
IRTFM
  • 240,863
  • 19
  • 328
  • 451