I am trying to identifying the trending tags (based on maximum hits) on time series using mysql json feature. Below is my table
CREATE TABLE TAG_COUNTER (
account varchar(36) NOT NULL,
time_id INT NOT NULL,
counters JSON,
PRIMARY KEY (account, time_id)
)
In every web api request, i will be getting the multiple different tags per account, and based on number of tags, i will prepare the INSERT ON DUPLICATE KEY UPDATE
query. Below example is showing insertion with two tags.
INSERT INTO `TAG_COUNTER`
(`account`, `time_id`, `counters`)
VALUES
('google', '2018061023', '{"tag1": 1, "tag2": 1}')
ON DUPLICATE KEY UPDATE `counters` =
JSON_SET(`counters`,
'$."tag1"',
IFNULL(JSON_EXTRACT(`counters`,
'$."tag1"'), 0) + 1,
'$."tag2"',
IFNULL(JSON_EXTRACT(`counters`,
'$."tag2"'), 0) + 1
);
time_id is yyyyMMddhh, and it is hourly aggregation on each row.
Now my problem is retrival of treding tags. Below query will give me aggregation for tag1, but we will not be knowing the tags before making this query.
SELECT
SUBSTRING(time_id, 1, 6) AS month,
SUM(counters->>'$.tag1')
FROM TAG_COUNTER
WHERE counters->>'$.tag1' > 0
GROUP BY month;
So i need generic group by query along with order by to get the trending tags for the time hourly/daily/monthly.
The sample of output expected is
Time(hour/day/month) Tag_name Tag_count_value(total hits)
When i have searched the web, every where it is mentioned like below
{"tag_name": "tag1", "tag_count": 1}
instead of direct {"tag1" : 1}
and they were using tag_name in the group by.
Q1) So is it always mandatory to have common known json key to perform group by ..?
Q2) If i have to go with this way, then what is the change in my INSERT ON DUPLICATE KEY UPDATE query for this new json label/value struture? Since the counter has to be created when it is not existing and should increment by one when it is existing.
Q3) do i have to maintain array of objects
[
{"tag_name": "tag1", "tag_count": 2},
{"tag_name": "tag2", "tag_count": 3}
]
OR object of objects like below?
{
{"tag_name": "tag1", "tag_count": 2},
{"tag_name": "tag2", "tag_count": 3}
}
So which is better above json structure interms of INSERT and RETRIEVAL of trending count?
Q4) Can i go with existing {"key" : "value"}
format instead of {"key_label" : key, "value_lable" : "value"}
and possible to extract trending ..? since i am thinking that {"key" : "value"}
is very straight forward and good at performance wise.
Q5) While retrieving i am using SUBSTRING(time_id, 1, 6) AS month
. Will it be able to use index?
OR do i need to create multiple columns like time_hour(2018061023)
, time_day(20180610)
, time_month(201806)
and use query on specific columns?
OR can i use mysql date-time functions? will that use index for faster retrieval?
Please help.