Is it possible to aggregate or summarize dataset in R with median?

Question

I am trying to aggregate dataset in R with median.

d <- aggregate(c(d$user_reported_percent, d$machine_percent), 
                       by = list(d$day), FUN=median, simplify = TRUE, drop = TRUE)

But R keeps complaining, and I am not sure if it even makes sense to aggregate with median.

Some of the errors R gives me: Error in aggregate.data.frame(as.data.frame(x), ...) : arguments must have same length

Then i tried to use mutate to at least find the median

d <- d %>% group_by(day) %>% mutate(median=median(user_reported_percent))

error was: Error: invalid subscript type 'integer'

I would appreciate any help! Thanks a lot!

P.S with mean everything works perfectly fine

my dataset looks like this:

structure(list(esmFollValue = c(36.00852, 8.688648, 0.6372048, 
13.7394, 0.7599012, 16.43628, 7.569684, 0.4502016, 0.7630464, 
0.781386, 0.5116056, 0.858756, 18.06108, 0.5473332, 14.62944, 
14.62944, 14.07216, 0.5366868, 14.12892, 0.7354944), esmHappValue = c(100L, 
80L, 80L, 80L, 60L, 80L, 60L, 60L, 80L, 60L, 100L, 60L, 80L, 
60L, 60L, 60L, 60L, 60L, 80L, 60L), deviceId = structure(c(11L, 
11L, 11L, 6L, 3L, 15L, 3L, 3L, 15L, 3L, 15L, 15L, 15L, 15L, 3L, 
3L, 15L, 3L, 9L, 9L), .Label = c("1e6c1183-af64-4860-b2d6-533cab7afe6c", 
"34209e3d-1a82-4f75-95c8-846be8a1be03", "7066f4af-82f3-4369-8f45-70d1ea3d22f2", 
"7cf78328-60c5-4564-9dd0-309cb0b3d5ad", "95b11f22-91e8-46d0-88d9-4f197267aa29", 
"a0c89d2a-d22d-41d0-a070-b9887d911953", "cde8cc10-7212-4a41-ae9b-bbeb51dbe8ed", 
"d150bfa4-0b52-47a0-b450-1eb21aaada53", "d41db7bc-2b81-4111-9b32-a0aab55cb25a", 
"d7e8e8c7-5190-4f0b-aa49-72e520bc9aad", "dd1218a2-4e67-4cbf-bf4d-9e288865aa63", 
"f093abf9-22e1-47e6-ae5d-1238629d8542", "fae0dd29-2b89-4c1d-b5ad-7858abe122ac", 
"feeb0ab0-7d13-4a5c-b0df-58dd85c7f607", "ff883e61-c9a9-4e6b-8b6b-cab3e5535879"
), class = "factor"), timestamp = c(1457272936.882, 1457337998.931, 
1457424251.996, 1457429767.632, 1457597635.755, 1457683537.604, 
1457861178.161, 1457964712.356, 1458029223.54, 1458046931.652, 
1458051135.219, 1458115293.069, 1458133652.503, 1458202019.302, 
1458203945.674, 1458203945.787, 1458306790.803, 1458308783.441, 
1458460903.755, 1458480932.088), group = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("groupA", "groupB", "groupC", "groupD"), class = "factor"), 
    cameraFeed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Non-visible camera feed", 
    "Visible camera feed"), class = "factor"), timegroup = structure(c(2L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
    2L, 2L, 1L, 2L), .Label = c("Day", "Evening"), class = "factor"), 
    day = structure(c(4L, 2L, 6L, 6L, 5L, 1L, 4L, 2L, 6L, 6L, 
    6L, 7L, 7L, 5L, 5L, 5L, 1L, 1L, 4L, 4L), .Label = c("Friday", 
    "Monday", "Saturday", "Sunday", "Thursday", "Tuesday", "Wednesday"
    ), class = "factor"), user_reported_percent = c(83.3333333333333, 
    66.6666666666667, 66.6666666666667, 66.6666666666667, 50, 
    66.6666666666667, 50, 50, 66.6666666666667, 50, 83.3333333333333, 
    50, 66.6666666666667, 50, 50, 50, 50, 50, 66.6666666666667, 
    50), machine_percent = c(30.0071, 7.24054, 0.531004, 11.4495, 
    0.633251, 13.6969, 6.30807, 0.375168, 0.635872, 0.651155, 
    0.426338, 0.71563, 15.0509, 0.456111, 12.1912, 12.1912, 11.7268, 
    0.447239, 11.7741, 0.612912)), .Names = c("esmFollValue", 
"esmHappValue", "deviceId", "timestamp", "group", "cameraFeed", 
"timegroup", "day", "user_reported_percent", "machine_percent"
), row.names = c(NA, 20L), class = "data.frame")

and I would like to have one value of percent per each day.

Sorry for not being specific and descriptive. I added the dataset snapshot now — Zhanna Sarsenbayeva, Nov 21 '16 at 10:19
You have at least one error in the `aggregate` line. More likely that the first argument should be `d[,c("user_reported_percent","machine_percent")]`. It won't guarantee that it will work, but the error you are receiving comes from the fact that your first argument's length is double than the length of the grouping variable (do you see why?). — nicola, Nov 21 '16 at 10:19
Also, remove that image and post the output of `dput(head(d,20))`. — nicola, Nov 21 '16 at 10:20
@nicola Thanks for your comment. I changed the first argument, still it gives me the same error. and I am a bit confused how to add the output of dput() function. Doesn't work as a code snippet or other options — Zhanna Sarsenbayeva, Nov 21 '16 at 10:33
Does `dput(head(d,20))` give you an output? Copy/paste that output on your question. That's it. — nicola, Nov 21 '16 at 10:36
@nicola updated my question but looks horrible. I am sure there is a way to do it, and i am doing smth wrong :) — Zhanna Sarsenbayeva, Nov 21 '16 at 10:39
On your data `aggregate(d[,c("user_reported_percent","machine_percent")],by = list(d$day), FUN=median)` works perfectly fine. — nicola, Nov 21 '16 at 10:45
@nicola yes, thanks a lot! now seems to work perfectly fine! Sorry for the mess and thanks again! — Zhanna Sarsenbayeva, Nov 21 '16 at 10:48

score 0 · Answer 1 · answered Nov 21 '16 at 10:57

0

with the help of @nicola I used this:

aggregate(d[,c("user_reported_percent","machine_percent")],b‌y = list(d$day), FUN=median)

and everything worked fine. Thanks a lot!

answered Nov 21 '16 at 10:57

Zhanna Sarsenbayeva

15
7

Is it possible to aggregate or summarize dataset in R with median?

1 Answers1