2

I have data in the following format:

date        fruit
======================
1-jan-14    orange
3-jan-14    orange
1-may-14    orange
2-may-14    apple
3-may-14    apple
1-aug-14    pineapple
2-aug-14    apple

I want to add a column which will represent: most purchased fruit in past 6 months. So, for the above example:

date        fruit       most_purchased_6_months
=======================================
1-jan-14    orange      orange
3-jan-14    orange      orange
1-may-14    orange      orange
2-may-14    apple       orange
3-may-14    apple       orange
1-aug-14    pineapple   apple
2-aug-14    apple       apple

I have tried using analytical functions but the best I have reached is showing the number of times the same fruit (from the same row) has been purchased in past 6 months using "count(*) over (partition by fruit order by date range 180 preceding)".

My main issue is that using a self-join would easily solve my issue but as this is a table with about 20million rows of data it takes a few hours to run on our database. Would appreciate any direction and/or help!

AYR
  • 950
  • 3
  • 11
  • 22
  • At least share your attempt, so people can build on that or correct any mistakes instead of having to write a query from scratch. – GolezTrol Aug 24 '14 at 10:38
  • I shared what I tried ("count(*)...."). I haven't reached anything significant other than that. – AYR Aug 24 '14 at 10:45
  • How many "fruits" do you have? And what version of Oracle are you using? – Gordon Linoff Aug 24 '14 at 11:52
  • 11g. The example I used with fruits was used to simplify the matter. The actual problem is appointments with doctors where we want to know which doctor the patient saw the most times in the past 6 months. So the number of doctors is actually large. – AYR Aug 24 '14 at 12:05
  • @AYR . . . The issue isn't "the past six months". That is easy. The issue is the running total nature of the problem. – Gordon Linoff Aug 24 '14 at 12:06

1 Answers1

0

This is your basic query:

select date, fruit,
       count(*) over (partition by fruit order by date range 180 preceding) as NumberPurchased
from data;

I think you can do this with a subquery and first_value():

select date, fruit,
       first_value(fruit) over (order by NumberPurchased desc range 180 preceding)
from (select date, fruit,
             count(*) over (partition by fruit order by date range 180 preceding) as NumberPurchased
      from data d
     ) d;

I'm leaving the answer but the above doesn't work because the counts change on each row for the preceding six months.

EDIT:

Here is an entirely different approach. Calculate for each day and each fruit the cumulative sum into a temporary table. Then index that table and join to it to get the most common fruit using keep:

create table FruitCountCume as
    select f.fruit, da.date, count(*) over (partition by f.fruit order by d.date) as cumecnt
    from (select distinct fruit from data) f cross join
         (select distinct date from data) d left outer join
         data da
         on f.fruit = da.fruit and f.date = da.date;

create index on FruitCountCume(fruit, date);

select d1.date, d1.fruit,
       max(fruit) keep (dense_rank first order by d1.cumecnt - coalesce(d2.cumecnt, 0) desc) as MostCommon
from FruitCountCume d1 left outer join
     FruitCountCume d2
     on d1.date = d2.date + 180
group by d1.date;

The full cross join may not be necessary. It is there as a convenience to make the left join more efficient.

Gordon Linoff
  • 1,122,135
  • 50
  • 484
  • 624
  • Your code returns the fruit with the maximum number of purchases ever without regard to time intervals and therefore does not work as desired. :-S – AYR Aug 24 '14 at 11:40