I have created a database using sqlite3
in python that has thousands of tables. Each of these tables contains thousands of rows and ten columns. One of the columns is the date and time of an event: it is a string that is formatted as YYYY-mm-dd HH:MM:SS
, which I have defined to be the primary key for each table. Every so often, I collect some new data (hundreds of rows) for each of these tables. Each new dataset is pulled from a server and loaded in directly as a pandas
data frame or is stored as a CSV file. The new data contains the same ten columns as my original data. I need to update the tables in my database using this new data in the following way:
- Given a table in my database, for each row in the new dataset, if the date and time of the row matches the date and time of an existing row in my database, update the remaining columns of that row using the values in the new dataset.
- If the date and time does not yet exist, create a new row and insert it to my database.
Below are my questions:
- I've done some searching on Google and it looks like I should be using the UPSERT (merge) functionality of
sqlite
but I can't seem to find any examples showing how to use it. Is there an actual UPSERT command, and if so, could someone please provide an example (preferably withsqlite3
in Python) or point me to a helpful resource? - Also, is there a way to do this in bulk so that I can UPSERT each new dataset into my database without having to go row by row? (I found this link, which suggests that it is possible, but I'm new to using databases and am not sure how to actually run the UPSERT command.)
- Can UPSERT also be performed directly using
pandas.DataFrame.to_sql
?
My backup solution is loading in the table to be UPSERTed using pd.read_sql_query("SELECT * from table", con)
, performing pandas.DataFrame.merge
, deleting the said table from the database, and then adding in the updated table to the database using pd.DataFrame.to_sql
(but this would be inefficient).