0

I have the following query, where I retrieve the number of sales, and the average price of those sales for each day, for a particular item.

SELECT COUNT(1) AS num_sales, DATE_FORMAT(sales.created_at, '%Y-%m-%d') AS date, AVG(prices.price) AS avg_price
FROM sales INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7503 AND (`prices`.`source` = 0 or (`prices`.`price` >= 400 and `prices`.`source` > 0))
GROUP BY date
ORDER BY date ASC

I also have a for-loop that does a separate query for each day to get the median price (let's assume the number of results are even):

SELECT prices.price FROM sales INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7503 
AND (`prices`.`source` = 0 or (`prices`.`price` >= 400 and `prices`.`source` > 0))
AND DATE(sales.created_at) = "<THE DATE OF THE CURRENT FOR-LOOP OBJECT>"
ORDER BY prices.price ASC
LIMIT 1 OFFSET <NUMBER OF THE MIDDLE ROW>

As you can imagine, this is very slow, as in some cases hundreds of queries must be done on a large table (the sales table has a few hundred million rows).

How do you rewrite the first SQL query so that it also calculates the median of prices.price, similar to AVG(prices.price)? I've looked at answers such as this one but can't wrap my head around how to adapt it for my specific scenario.

I've spent hours trying to accomplish this, but my SQL knowledge simply isn't good enough. Any help would be greatly appreciated!

root@ns525077:~# mysql -V
mysql  Ver 14.14 Distrib 5.7.13, for Linux (x86_64) using  EditLine wrapper

Table schemas:

CREATE TABLE `prices` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `item_id` int(11) unsigned NOT NULL,
 `price` decimal(8,2) NOT NULL,
 `net_price` decimal(8,2) NOT NULL,
 `source` tinyint(4) NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 KEY `prices_ibfk_1` (`item_id`),
 CONSTRAINT `prices_ibfk_1` FOREIGN KEY (`item_id`) REFERENCES `items` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=4861375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `sales` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `price_id` int(11) unsigned DEFAULT NULL,
 `item_key` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 UNIQUE KEY `item_key` (`item_key`),
 KEY `price_id` (`price_id`),
 KEY `created_at` (`created_at`),
 KEY `price_id__created_at__IX` (`price_id`,`created_at`),
 CONSTRAINT `sales_ibfk_1` FOREIGN KEY (`price_id`) REFERENCES `prices` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=386156944 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Example of output from my first query:

Example of output from my first query

Community
  • 1
  • 1
waylaidwanderer
  • 1,851
  • 2
  • 13
  • 9

1 Answers1

0

I found the answer to my question here, after extensive searching. Perhaps I didn't word my question right initially.

I have adapted the solution to my own case, and here's the working query:

SELECT COUNT(1) AS num_sales,
       DATE_FORMAT(sales.created_at, '%Y-%m-%d') AS date,
       AVG(prices.price) AS avg_price,
       CASE(COUNT(1) % 2)
       WHEN 1 THEN SUBSTRING_INDEX(
           SUBSTRING_INDEX(
               group_concat(prices.price
                            ORDER BY prices.price SEPARATOR ',')
               , ',', (count(*) + 1) / 2)
           , ',', -1)
       ELSE (SUBSTRING_INDEX(
                 SUBSTRING_INDEX(
                     group_concat(prices.price
                                  ORDER BY prices.price SEPARATOR ',')
                     , ',', count(*) / 2)
                 , ',', -1)
             + SUBSTRING_INDEX(
                 SUBSTRING_INDEX(
                     group_concat(prices.price
                                  ORDER BY prices.price SEPARATOR ',')
                     , ',', (count(*) + 1) / 2)
                 , ',', -1)) / 2
       END median_price
FROM sales
  INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7381
      AND (`prices`.`source` = 0
           OR (`prices`.`price` >= 400
               AND `prices`.`source` > 0))
GROUP BY date
ORDER BY date ASC;
Community
  • 1
  • 1
waylaidwanderer
  • 1,851
  • 2
  • 13
  • 9