0

I have a database table that receives close to 1 million inserts a day that needs to be searchable for at least a year. Big hard drive and lots of data and not that great hardware to put it on either.

The table looks like this:

id      | tag_id  |  value  |  time 
----------------------------------------
279571     55         0.57    2013-06-18 12:43:22
...

tag_id might be something like AmbientTemperature or AmbientHumidity and the time is captured when the reading is taken from the sensor.

I'm querying on this table in a reporting format. I want to see all data for tags 1,55,72, and 4 between 2013-11-1 and 2013-11-28 at 1 hour intervals.

SELECT time, tag_id, tag_name, value, friendly_name
FROM (
    SELECT time, tag_name, tag_id, value,friendly_name, 
        ROW_NUMBER() over (partition by tag_id,datediff(hour, 0, time)/1 order by time desc) as seqnum
    FROM tag_values tv 
    JOIN tag_names tn ON tn.id = tv.tag_id
    WHERE (tag_id = 1 OR tag_id = 55 OR tag_id = 72 OR tag_id = 4)
        AND time >= '2013-11-1' AND time < '2013-11-28'
    ) k
WHERE seqnum = 1
ORDER BY time";

Can I optimize this table or my query at all? How should I set up my indexes?

It's pretty slow with a table size of 100 million + rows. It can take several minutes to get a data set of 7 days at an hourly interval with 3 tags in the query.

marc_s
  • 675,133
  • 158
  • 1,253
  • 1,388
Chris G.
  • 3,903
  • 2
  • 17
  • 38
  • 1
    Make better use of your clustered primary key index. The following may prove of interest: http://technet.microsoft.com/en-us/library/aa933131(v=sql.80).aspx and http://stackoverflow.com/questions/4419499/mysql-and-nosql-help-me-to-choose-the-right-one/4421601#4421601 and http://stackoverflow.com/questions/5451190/60-million-entries-select-entries-from-a-certain-month-how-to-optimize-databas/5451389#5451389 – Jon Black Nov 21 '13 at 20:22

3 Answers3

1

How should I set up my indexes?

I would try following index:

CREATE /*UNIQUE*/ INDEX IX_MyTable_tag_id_time -- If this index could be unique then uncomment UNIQUE
ON dbo.tag_values (tag_id, time)
INCLUDE (value) -- Covered column
WITH (FILLFACTOR = 90); -- Needed to minimize page splits. You should test other values for fill factor to find optimum value for your workload. 90 is just an example. Default value is usually 0 or 100 (see http://technet.microsoft.com/en-us/library/ms190470.aspx) 
GO
Bogdan Sahlean
  • 19,000
  • 3
  • 36
  • 55
1

Filtering on the result of the row number function will make the query painfully slow. Also it will prevent optimal index use.

If your primary reporting need is hourly information you might want to consider storing which rows are the first sensor reading for a tag in a specific hour.

ALTER TABLE tag_values ADD IsHourlySensorReading BIT NULL;

In an hourly process, you calculate this column for new rows.

DECLARE @CalculateFrom DATETIME = (SELECT MIN(time) FROM tag_values WHERE IsHourlySensorReading IS NULL);
SET @CalculateFrom = dateadd(hour, 0, datediff(hour, 0, @CalculateFrom));

UPDATE k
SET IsHourlySensorReading = CASE seqnum WHEN 1 THEN 1 ELSE 0 END
FROM (
    SELECT id, row_number() over (partition by tag_id,datediff(hour, 0, time)/1 order by time desc) as seqnum
    FROM tag_values tv
    WHERE tv.time >= @CalculateFrom
    AND tv.IsHourlySensorReading IS NULL
) as k

Your reporting query then becomes much simpler:

SELECT time, tag_id, tag_name, value, friendly_name
FROM (
    SELECT time, tag_name, tag_id, value,friendly_name
    FROM tag_values tv 
    JOIN tag_names tn ON tn.id = tv.tag_id
    WHERE (tag_id = 1 OR tag_id = 55 OR tag_id = 72 OR tag_id = 4)
        AND time >= '2013-11-1' AND time < '2013-11-28'
        AND IsHourlySensorReading=1
    ) k
ORDER BY time;

The following index will help calculating the IsHourlySensorReading column. But remember, indexes will also cause your million inserts per day to take more time. Test thoroughly!

CREATE NONCLUSTERED INDEX tag_values_ixnc01 ON tag_values (time, IsHourlySensorReading) WHERE (IsHourlySensorReading IS NULL);

Use this index for reporting if you need order by time.

CREATE NONCLUSTERED INDEX tag_values_ixnc02 ON tag_values (time, tag_id, IsHourlySensorReading) INCLUDE (value) WHERE (IsHourlySensorReading = 1);

Use this index for reporting if you don't need order by time.

CREATE NONCLUSTERED INDEX tag_values_ixnc02 ON tag_values (tag_id, time, IsHourlySensorReading) INCLUDE (value) WHERE (IsHourlySensorReading = 1);

Some additional things to consider:

  • Is ORDER BY time really required?
  • Table partitioning can seriously improve both insert and query performance. Depending on your situation I would partition on either tag_id or date.
  • Instead of creating a column with an IsHourlySensorReading indicator, you can also create a separate table/database for specific reporting requirements and only load the relevant data into that.
S Koppenol
  • 195
  • 9
  • I cant add the "Hourly" column in your answer. Specifications require that the data viewer tool show data at whatever interval the user prefers. Hourly, Daily, Half-Hourly, by the second, etc. Table partitioning requires enterprise, and yeah it would be great if the report showed the sensor readings in order by time - nothing else would make sense. – Chris G. Nov 22 '13 at 14:13
0

I'm not an expert on sqlserver, but I would seriously consider setting this up as a partitioned table. This would also make archiving easier as partitions could simply be dropped (rather than an expensive delete from where...).

Also (with a bit of luck) the optimiser will only look in the partitions required for the data.