I'm building a dashboard that reports on changes in data over time.
Rows are grouped by category, and each category has a change interval. For each category, I'd like to efficiently select the row whose time is closest to (most recent time for this category) - (interval for this category)
.
| category | value | time | interval |
|----------|-------|------|----------|
| 1 | 2 | 100 | 30 |
| 1 | 1 | 50 | 30 |
| 1 | 0 | 0 | 30 |
| 2 | 2 | 120 | 90 |
| 2 | 1 | 70 | 90 |
| 2 | 0 | 5 | 90 |
In this example, the most recent category_1 time is 100
. Since the category_1 interval is 30
, I want to get the category_1 row closest to time (100 - 30) = 70
, which in this case is the row with time 50
.
On the other hand, the category_2 interval is 90
, so we want the category_2 row closest to time 120 - 90 = 30
, which is the row with time 5
.
I'm working with Amazon RDS. I'd prefer to use widely-supported SQL features since the database may change.
Here are some options I'm aware of. First, I could use a correlated subquery to compute these values directly; however, this is not performant.
Second, I could create a custom aggregate function. However, results like Function to Calculate Median in SQL Server tend to use techniques that don't apply to my problem, like percentiles and cases.
For reference, here's how I'm getting the most recent row in each group:
SELECT s1.category, time
FROM dashboard s1
JOIN (
SELECT category, MAX(time) AS s2time
FROM dashboard
GROUP BY category) AS s2
ON s1.category = s2.category AND s1.time = s2time
ORDER BY category;
What's my best option here?