16

i have a table like this:

Number   Price    Type       Date         Time
------   -----    ----    ----------    ---------
23456    0,665     SV     2014/02/02     08:00:02
23457    1,3       EC     2014/02/02     07:50:45
23460    0,668     SV     2014/02/02     07:36:34

For each EC I need previous/next SV price. In this case, the query is simple.

Select Lag(price, 1, price) over (order by date desc, time desc),
Lead(price, 1, price) over (order by date desc, time desc)
from ITEMS

But, there are some special cases where two or more rows are EC type:

Number   Price    Type       Date         Time
------   -----    ----    ----------    ---------
23456    0,665     SV     2014/02/02     08:00:02
23457    1,3       EC     2014/02/02     07:50:45
23658    2,4       EC     2014/02/02     07:50:45
23660    2,4       EC     2014/02/02     07:50:48
23465    0,668     SV     2014/02/02     07:36:34 

can I use Lead/Lag in this cases? If not, did I have to use a subquery?

Cœur
  • 32,421
  • 21
  • 173
  • 232
user3292586
  • 169
  • 1
  • 1
  • 5

3 Answers3

13

Your question (and Anon's excellent answer) is part of the SQL of islands and gaps. In this answer, I will try to examine the "row_number() magic" in detail.

I've made a simple example based on events in a ballgame. For each event, we'd like to print the previous and next quarter related message:

create table TestTable (id int identity, event varchar(64));
insert TestTable values
    ('Start of Q1'),
    ('Free kick'),
    ('Goal'),
    ('End of Q1'),
    ('Start of Q2'),
    ('Penalty'),
    ('Miss'),
    ('Yellow card'),
    ('End of Q2');

Here's a query showing off the "row_number() magic" approach:

; with  grouped as
        (
        select  *
        ,       row_number() over (order by id) as rn1
        ,       row_number() over (
                    partition by case when event like '%of Q[1-4]' then 1 end 
                    order by id) as rn2
        from    TestTable
        )
,       order_in_group as
        (
        select  *
        ,       rn1-rn2 as group_nr
        ,       row_number() over (partition by rn1-rn2 order by id) as rank_asc
        ,       row_number() over (partition by rn1-rn2 order by id desc)
                    as rank_desc
        from    grouped
        )
select  *
,       lag(event, rank_asc) over (order by id) as last_event_of_prev_group
,       lead(event, rank_desc) over (order by id) as first_event_of_next_group
from    order_in_group
order by
        id
  • The first CTE called "grouped" calculates two row_number()s. The first is 1 2 3 for each row in the table. The second row_number() places pause announcements in one list, and other events in a second list. The difference between the two, rn1 - rn2, is unique for each section of the game. It's helpful to check difference in the example output: it's in the group_nr column. You'll see that each value corresponds to one section of the game.
  • The second CTE called "order_in_group" determines the position of the current row within its island or gap. For an island with 3 rows, the positions are 1 2 3 for the ascending order, and 3 2 1 for the descending order.
  • Finally, we know enough to tell lag() and lead() how far to jump. We have to lag rank_asc rows to find the final row of the previous section. To find the first row of the next section, we have to lead rank_desc rows.

Hope this helps clarifying the "magic" of Gaps and Islands. Here is a working example at SQL Fiddle.

Andomar
  • 216,619
  • 41
  • 352
  • 379
8

Yes, you can use LEAD/LAG. You just need to precalculate how far to jump with a little ROW_NUMBER() magic.

DECLARE @a TABLE ( number int, price money, type varchar(2),
                   date date, time time)
INSERT @a VALUES
(23456,0.665,'SV','2014/02/02','08:00:02'),
(23457,1.3  ,'EC','2014/02/02','07:50:45'),
(23658,2.4  ,'EC','2014/02/02','07:50:45'),
(23660,2.4  ,'EC','2014/02/02','07:50:48'),
(23465,0.668,'SV','2014/02/02','07:36:34');

; WITH a AS (
     SELECT *,
            ROW_NUMBER() OVER(ORDER BY [date] DESC, [time] DESC) x, 
            ROW_NUMBER() OVER(PARTITION BY 
               CASE [type] WHEN 'SV' THEN 1 ELSE 0 END 
               ORDER BY [date] DESC, [time] DESC) y 
     FROM @a)
 , b AS (
     SELECT *,
            ROW_NUMBER() OVER(PARTITION BY x-y ORDER BY x ASC) z1,
            ROW_NUMBER() OVER(PARTITION BY x-y ORDER BY x DESC) z2 
     FROM a)
SELECT *,
       CASE [type] WHEN 'SV' THEN 
           LAG(price,z1,price) OVER(PARTITION BY [type] ORDER BY x) 
           ELSE LAG(price,z1,price) OVER(ORDER BY x) 
           END,
       CASE [type] WHEN 'SV' THEN 
           LEAD(price,z2,price) OVER(PARTITION BY [type] ORDER BY x) 
           ELSE LEAD(price,z2,price) OVER(ORDER BY x) 
           END
FROM b
ORDER BY x
Andomar
  • 216,619
  • 41
  • 352
  • 379
Anon
  • 9,583
  • 1
  • 25
  • 31
  • Great solution, but my RDBMS (Netezza) complains that the offset orgument to the lead function must be a "positive integer constant". – stevepastelan May 07 '17 at 21:15
1

Here is yet another way of achieving the same result, but using conditional max/min functions windowed over an ordinal. The ordinal can be be set up based on whatever columns fits the purpose, but in this case I believe the OP intends them to be Date and Time.

DROP TABLE IF EXISTS #t;
CREATE TABLE #t (
    Number INT, 
    Price MONEY, 
    Type CHAR(2),
    Date DATE,
    Time TIME(0)
);
INSERT INTO #t VALUES 
(23456, 0.666, 'SV', '2014/02/02', '10:00:02'),
(23457, 1.4  , 'EC', '2014/02/02', '09:50:45'),
(23658, 2.5  , 'EC', '2014/02/02', '09:50:45'),
(23660, 2.5  , 'EC', '2014/02/02', '09:50:48'),
(23465, 0.669, 'SV', '2014/02/02', '09:36:34'),
(23456, 0.665, 'SV', '2014/02/02', '08:00:02'),
(23457, 1.3  , 'EC', '2014/02/02', '07:50:45'),
(23658, 2.4  , 'EC', '2014/02/02', '07:50:45'),
(23660, 2.4  , 'EC', '2014/02/02', '07:50:48'),
(23465, 0.668, 'SV', '2014/02/02', '07:36:34'), -- which one of these?
(23465, 0.670, 'SV', '2014/02/02', '07:36:34'); -- 

WITH time_ordered AS (
    SELECT *, DENSE_RANK() OVER (ORDER BY Date, Time) AS ordinal FROM #t
)
SELECT
    *,
    CASE WHEN Type = 'EC' 
    THEN MAX(CASE WHEN ordinal = preceding_non_EC_ordinal THEN Price END) 
    OVER (PARTITION BY preceding_non_EC_ordinal ORDER BY ordinal ASC) END AS preceding_price,
    CASE WHEN Type = 'EC'
    THEN MIN(CASE WHEN ordinal = following_non_EC_ordinal THEN Price END) 
    OVER (PARTITION BY following_non_EC_ordinal ORDER BY ordinal DESC) END AS following_price
FROM (
    SELECT
        *,
        MAX(CASE WHEN Type <> 'EC' THEN ordinal END) 
        OVER (ORDER BY ordinal ASC) AS preceding_non_EC_ordinal,
        MIN(CASE WHEN Type <> 'EC' THEN ordinal END) 
        OVER (ORDER BY ordinal DESC) AS following_non_EC_ordinal
    FROM time_ordered
) t
ORDER BY Date, Time

Note that the example given by the OP has been extended to show that interspersed sequences of EC yeild the intended result. The ambiguity introduced by the earliest two consecutive rows with type SV will in this case lead to the maximum value being picked. Setting up the ordinal to include the Price is a possible way to change this behavior.

An SQLFiddle can be found here: http://sqlfiddle.com/#!18/85117/1

Lars Rönnbäck
  • 695
  • 5
  • 10