Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases, this is accomplished with the use of GROUP BY and aggregate functions.

6659 questions
84
votes
7 answers

How does a site like kayak.com aggregate content?

Greetings, I've been toying with an idea for a new project and was wondering if anyone has any idea on how a service like Kayak.com is able to aggregate data from so many sources so quickly and accurately. More specifically, do you think Kayak.com…
Jeff
  • 2,758
  • 3
  • 27
  • 30
83
votes
4 answers

Collapse / concatenate / aggregate a column to a single comma separated string within each group

I want to aggregate one column in a data frame according to two grouping variables, and separate the individual values by a comma. Here is some data: data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = rep(1:2, 3), C = c(5:10)) data # A B …
linp
  • 1,167
  • 1
  • 10
  • 13
81
votes
4 answers

ListAGG in SQLSERVER

I'm trying to aggregate a 'STRING' field in SQLServer. I would like to find the same function LISTAGG like in Oracle . Do you know how to do the same function or an another method? For Example, Field A | Field B 1 | A 1 | B 2 | …
user1557642
  • 833
  • 1
  • 9
  • 6
79
votes
7 answers

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't…
Ed Swindelles
  • 903
  • 1
  • 7
  • 6
77
votes
7 answers

Renaming columns for PySpark DataFrame aggregates

I am analysing some data with PySpark DataFrames. Suppose I have a DataFrame df that I am aggregating: (df.groupBy("group") .agg({"money":"sum"}) .show(100)) This will give me: group SUM(money#2L) A …
cantdutchthis
  • 24,764
  • 14
  • 64
  • 105
73
votes
4 answers

Keep other columns when doing groupby

I'm using groupby on a pandas dataframe to drop all rows that don't have the minimum of a specific column. Something like this: df1 = df.groupby("item", as_index=False)["diff"].min() However, if I have more than those two columns, the other…
PointXIV
  • 988
  • 1
  • 13
  • 19
72
votes
3 answers

Extract the maximum value within each group in a dataframe

I have a data frame with a grouping variable ("Gene") and a value variable ("Value"): Gene Value A 12 A 10 B 3 B 5 B 6 C 1 D 3 D 4 For each level of my grouping variable, I wish to extract the maximum…
Johnathan
  • 1,587
  • 3
  • 18
  • 27
68
votes
3 answers

Add count of unique / distinct values by group to the original data

I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing data frame looks like this: color type 1 black chair 2 black chair 3…
Bryan
  • 1,517
  • 3
  • 16
  • 29
65
votes
4 answers

data.frame Group By column

I have a data frame DF. Say DF is: A B 1 1 2 2 1 3 3 2 3 4 3 5 5 3 6 Now I want to combine together the rows by the column A and to have the sum of the column B. For example: A B 1 1 5 2 2 3 3 3 11 I am doing this currently using an SQL query…
nikosdi
  • 1,938
  • 4
  • 24
  • 34
64
votes
4 answers

Name columns within aggregate in R

I know I can *re*name columns after I aggregate the data: blubb <- aggregate(dat$two ~ dat$one, ...) colnames(blubb) <- c("One", "Two") Nothing wrong with that. But is there a way to aggregate and name the columns in one go? Sort of like: blubb <-…
user1322720
61
votes
9 answers

Select the top N values by group

This is in response to a question asked on the r-help mailing list. Here are lots of examples of how to find top values by group using sql, so I imagine it's easy to convert that knowledge over using the R sqldf package. An example: when mtcars is…
Anthony Damico
  • 5,100
  • 6
  • 43
  • 71
56
votes
10 answers

Count number of rows per group and add result to original data frame

Say I have a data.frame object: df <- data.frame(name=c('black','black','black','red','red'), type=c('chair','chair','sofa','sofa','plate'), num=c(4,5,12,4,3)) Now I want to count the number of rows (observations)…
Uri Laserson
  • 2,069
  • 4
  • 26
  • 38
56
votes
8 answers

Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column…
jul635
  • 774
  • 1
  • 7
  • 11
56
votes
4 answers

Aggregating by unique identifier and concatenating related values into a string

I have a need that I imagine could be satisfied by aggregate or reshape, but I can't quite figure out. I have a list of names (brand), and accompanying ID number (id). This data is in long form, so names can have multiple ID's. I'd like to…
roody
  • 2,411
  • 5
  • 31
  • 47
55
votes
3 answers

How to include BIT type column in SELECT part with out including it on the GROUP BY in T-SQL?

Here is my T-SQL query SELECT ProductID, VendorID, ProductName= MAX(ProductName), VendorName = MAX(VendorName), IsActive = MAX(IsActive) # This brings error FROM ProductVendorAssoc GROUP BY ProductID, VendorID I…
Mithun Sreedharan
  • 45,549
  • 69
  • 171
  • 232