0

Please find the below as a problem scenario:

  1. Current DB design: I'm having 40 tables (currently each table is having 5 billion records) and these records are not having the same result which means each table consists of unique records in their own table format. Each table has a primary key (pk) "timestamp" which is in UTC with timezone in ISO format. Around 1000 records are getting inserted in each respective tables for every 5 min.

  2. Implementation: Now I have to get the most recent timestamp data from every table. I tried using for every table select * from table_name where timestamp = (select timestamp from table_name order by timestamp desc limit 1) which returns the most recent records but it took some time to get the results. After this query, I tried using select * from table_name where timestamp = (select max(timestamp) from table_name) but eventually this also took some to get the data.

So, How to minimize the query time to get the data from the database? (all the design and query are welcome)?

Thank you.

FYI, Im using Python3.6 and psycopg2

a_horse_with_no_name
  • 440,273
  • 77
  • 685
  • 758
Anshul Verma
  • 370
  • 4
  • 17
  • 1
    `distinct on ()` is typically a lot faster than a solution with a sub-select. If you just need **one** row, `select * from tablename order by timestamp desc limit 1` should also be faster – a_horse_with_no_name Dec 17 '19 at 10:46
  • @a_horse_with_no_name . . . Is that true if there is an index on `timestamp`? – Gordon Linoff Dec 17 '19 at 11:37
  • @GordonLinoff: if you only want a single row (with the "global max"), then `distinct on()` is indeed slower. But I am not sure about the statement with `order by` and `limit` – a_horse_with_no_name Dec 17 '19 at 12:09
  • Do you need all the data live all the time? Ex could you archive older data, thus reducing the table size? A bit crazy idea: if write performance is not as critical as "read the latest timestamp data" performance, you could setup a stored procedure on write actions that inserts the data twice. Once in the "all data" table, once in a "latest insert" table. The "last insert" table stores only 5 rows. One row per table you have (you said 40) with the values of the last insert done for each table. – Nic3500 Dec 21 '19 at 03:08
  • Or something like this maybe? https://stackoverflow.com/questions/11729690/postgresql-equivalent-of-mysql-memory-tables. For the "last insert" table, not the 40 tables! – Nic3500 Dec 21 '19 at 03:11
  • "CREATE INDEX lets_index ON public.5_billion_record_table USING btree ("timestamp" DESC NULLS LAST) TABLESPACE pg_default;" are we discussing on this , will this help me ? @a_horse_with_no_name – Anshul Verma Jan 02 '20 at 10:22

0 Answers0