Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

3029 questions
41
votes
2 answers

Spark Window Functions - rangeBetween dates

I am having a Spark SQL DataFrame with data and what I'm trying to get is all the rows preceding current row in a given date range. So for example I want to have all the rows from 7 days back preceding given row. I figured out I need to use a Window…
Nhor
  • 3,522
  • 6
  • 26
  • 41
35
votes
4 answers

What is the Hamming window for?

I'm working with some code that does a Fourier transform (to calculate the cepstrum of an audio sample). Before it computes the Fourier transform, it applies a Hamming window to the sample: for(int i = 0; i < SEGMENTATION_LENGTH;i++){ …
fredley
  • 29,323
  • 39
  • 131
  • 223
35
votes
4 answers

pyspark: rolling average using timeseries data

I have a dataset consisting of a timestamp column and a dollars column. I would like to find the average number of dollars per week ending at the timestamp of each row. I was initially looking at the pyspark.sql.functions.window function, but that…
Bob Swain
  • 2,242
  • 3
  • 13
  • 26
35
votes
3 answers

Spark SQL Row_number() PartitionBy Sort Desc

I've successfully create a row_number() partitionBy by in Spark using Window, but would like to sort this by descending, instead of the default ascending. Here is my working code: from pyspark import HiveContext from pyspark.sql.types import * from…
jKraut
  • 1,729
  • 5
  • 23
  • 40
35
votes
3 answers

How to use ROW_NUMBER() in UPDATE clause?

ROW_NUMBER() is only for used in the SELECT clause in MS SQL Server, but I want to use it for update like the following: Update MyTab Set MyNo = 123 + ROW_NUMBER() over (Order By ID) Where a=b; then I got Error like, Windowed functions can only…
KentZhou
  • 22,445
  • 40
  • 124
  • 194
33
votes
6 answers

How to perform grouped ranking in MySQL

So I have a table as follows: ID_STUDENT | ID_CLASS | GRADE ----------------------------- 1 | 1 | 90 1 | 2 | 80 2 | 1 | 99 3 | 1 | 80 4 | 1 | 70 5 | 2 …
achinda99
  • 4,824
  • 4
  • 32
  • 42
31
votes
4 answers

What is the difference between rowsBetween and rangeBetween?

From the PySpark docs rangeBetween: rangeBetween(start, end) Defines the frame boundaries, from start (inclusive) to end (inclusive). Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one…
Evan Zamir
  • 6,460
  • 11
  • 44
  • 68
31
votes
6 answers

Dynamic alternative to pivot with CASE and GROUP BY

I have a table that looks like this: id feh bar 1 10 A 2 20 A 3 3 B 4 4 B 5 5 C 6 6 D 7 7 D 8 8 D And I want it to look like this: bar val1 val2 val3 A 10 20…
flipflop99
  • 543
  • 1
  • 5
  • 8
26
votes
2 answers

pyspark: count distinct over a window

I just tried doing a countDistinct over a window and got this error: AnalysisException: u'Distinct window functions are not supported: count(distinct color#1926) Is there a way to do a distinct count over a window in pyspark? Here's some example…
26
votes
2 answers

Applying a Window function to calculate differences in pySpark

I am using pySpark, and have set up my dataframe with two columns representing a daily asset price as follows: ind = sc.parallelize(range(1,5)) prices = sc.parallelize([33.3,31.1,51.2,21.3]) data = ind.zip(prices) df =…
Thomas Moore
  • 799
  • 1
  • 7
  • 16
24
votes
2 answers

Filtering by window function result in Postgresql

Ok, initially this was just a joke we had with a friend of mine, but it turned into interesting technical question :) I have the following stuff table: CREATE TABLE stuff ( id serial PRIMARY KEY, volume integer NOT NULL DEFAULT 0, …
Maxim Sloyko
  • 13,143
  • 7
  • 36
  • 47
24
votes
1 answer

How to use window functions in PySpark?

I'm trying to use some windows functions (ntile and percentRank) for a data frame but I don't know how to use them. Can anyone help me with this please? In the Python API documentation there are no examples about it. Specifically, I'm trying to…
jegordon
  • 771
  • 3
  • 9
  • 16
24
votes
2 answers

Unexpected results when using FIRST_VALUE() in SQL Server 2012

When I use FIRST_VALUE on a data set that I construct by hand I get one result, and when I use it on a data set that results from a left join, I get a different result - even though the data sets appear to me to contain the exact same data values.…
22
votes
2 answers

Spark SQL window function with complex condition

This is probably easiest to explain through example. Suppose I have a DataFrame of user logins to a website, for instance: scala> df.show(5) +----------------+----------+ | …
user4601931
  • 4,153
  • 4
  • 21
  • 36
22
votes
4 answers

Count distinct values with OVER(PARTITION BY id)

Is it possible to count distinct values in conjunction with window functions like OVER(PARTITION BY id)? Currently my query is as follows: SELECT congestion.date, congestion.week_nb, congestion.id_congestion, congestion.id_element, ROW_NUMBER()…
user007
  • 974
  • 3
  • 10
  • 29