I'm storing the output of a sensor pod in a Postgres db. There are many sensor pods (dozens), each generating a reading every 100ms, 24 hours per day - that's 86x,xxx records per pod per day. The sensor pod is relatively dumb, and lives in an environment of unreliable connectivity, so it produces n
(tbd) line CSV files locally, and then ships them all off when it has network access.
The sensor pod knows its own name, and the data it produces but that's about it. I'm trying to decide how to load the data efficiently into the DB. There are two options I'm considering:
- Use the COPY syntax, and give it the file directly
- Do a mass insert
Can someone speak to the performance ramifications of both of those?
My hesitation with option 1 is that I need to supply some ancillary data (some foreign keys, etc.) that won't be in the file. The only way to do that, without making the sensor pod aware of the random crap that's also going into that table is to generate temporary tables for the load, and then move them into their final destination with an insert .. select
which seems wasteful.
This is a high write / low read environment.