Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases, this is accomplished with the use of GROUP BY and aggregate functions.

6659 questions

votes

7 answers

How does a site like kayak.com aggregate content?

Greetings, I've been toying with an idea for a new project and was wondering if anyone has any idea on how a service like Kayak.com is able to aggregate data from so many sources so quickly and accurately. More specifically, do you think Kayak.com…

asked Jan 05 '11 at 17:27

Jeff

2,758
3
27
30

votes

4 answers

Collapse / concatenate / aggregate a column to a single comma separated string within each group

I want to aggregate one column in a data frame according to two grouping variables, and separate the individual values by a comma. Here is some data: data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = rep(1:2, 3), C = c(5:10)) data # A B …

r aggregate r-faq

asked Apr 10 '13 at 18:47

linp

1,167
1
10
13

votes

4 answers

ListAGG in SQLSERVER

I'm trying to aggregate a 'STRING' field in SQLServer. I would like to find the same function LISTAGG like in Oracle . Do you know how to do the same function or an another method? For Example, Field A | Field B 1 | A 1 | B 2 | …

sql sql-server aggregate

asked Mar 18 '13 at 13:14

user1557642

votes

7 answers

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't…

r dplyr data.table aggregate

asked Jun 05 '14 at 21:45

Ed Swindelles

votes

7 answers

Renaming columns for PySpark DataFrame aggregates

I am analysing some data with PySpark DataFrames. Suppose I have a DataFrame df that I am aggregating: (df.groupBy("group") .agg({"money":"sum"}) .show(100)) This will give me: group SUM(money#2L) A …

dataframe pyspark aggregate

asked May 01 '15 at 14:01

cantdutchthis

24,764
14
64
105

votes

4 answers

Keep other columns when doing groupby

I'm using groupby on a pandas dataframe to drop all rows that don't have the minimum of a specific column. Something like this: df1 = df.groupby("item", as_index=False)["diff"].min() However, if I have more than those two columns, the other…

python pandas aggregate pandas-groupby

asked Apr 30 '14 at 17:29

PointXIV

votes

3 answers

Extract the maximum value within each group in a dataframe

I have a data frame with a grouping variable ("Gene") and a value variable ("Value"): Gene Value A 12 A 10 B 3 B 5 B 6 C 1 D 3 D 4 For each level of my grouping variable, I wish to extract the maximum…

r aggregate

asked Aug 14 '14 at 17:45

Johnathan

1,587
3
18
27

votes

3 answers

Add count of unique / distinct values by group to the original data

I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing data frame looks like this: color type 1 black chair 2 black chair 3…

r count unique aggregate

asked Jul 02 '13 at 09:18

Bryan

1,517
3
16
29

votes

4 answers

data.frame Group By column

I have a data frame DF. Say DF is: A B 1 1 2 2 1 3 3 2 3 4 3 5 5 3 6 Now I want to combine together the rows by the column A and to have the sum of the column B. For example: A B 1 1 5 2 2 3 3 3 11 I am doing this currently using an SQL query…

r aggregate

asked Sep 14 '13 at 08:36

nikosdi

1,938
4
24
34

votes

4 answers

Name columns within aggregate in R

I know I can *re*name columns after I aggregate the data: blubb <- aggregate(dat$two ~ dat$one, ...) colnames(blubb) <- c("One", "Two") Nothing wrong with that. But is there a way to aggregate and name the columns in one go? Sort of like: blubb <-…

r rename aggregate

asked Mar 09 '13 at 09:31

user1322720

votes

9 answers

Select the top N values by group

This is in response to a question asked on the r-help mailing list. Here are lots of examples of how to find top values by group using sql, so I imagine it's easy to convert that knowledge over using the R sqldf package. An example: when mtcars is…

r aggregate

asked Feb 10 '13 at 16:59

Anthony Damico

5,100
6
43
71

votes

10 answers

Count number of rows per group and add result to original data frame

Say I have a data.frame object: df <- data.frame(name=c('black','black','black','red','red'), type=c('chair','chair','sofa','sofa','plate'), num=c(4,5,12,4,3)) Now I want to count the number of rows (observations)…

r count aggregate r-faq

asked Sep 16 '11 at 21:33

Uri Laserson

2,069
4
26
38

votes

8 answers

Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column…

r aggregate plyr greatest-n-per-group

asked Jun 09 '11 at 07:37

jul635

votes

4 answers

Aggregating by unique identifier and concatenating related values into a string

I have a need that I imagine could be satisfied by aggregate or reshape, but I can't quite figure out. I have a list of names (brand), and accompanying ID number (id). This data is in long form, so names can have multiple ID's. I'd like to…

r concatenation aggregate

asked May 16 '13 at 20:08

roody

2,411
5
31
47

votes

3 answers

How to include BIT type column in SELECT part with out including it on the GROUP BY in T-SQL?

Here is my T-SQL query SELECT ProductID, VendorID, ProductName= MAX(ProductName), VendorName = MAX(VendorName), IsActive = MAX(IsActive) # This brings error FROM ProductVendorAssoc GROUP BY ProductID, VendorID I…

sql tsql group-by aggregate

asked May 19 '11 at 04:40

Mithun Sreedharan

45,549
69
171
232

Prev 1

…

99 100 Next