I am facing a problem to insert hundreds of rows of data into a relational database (PostgreSQL). The data to be inserted is saved in a .csv file, and the data is to be inserted into multiple tables with all kinds of relationship (one-to-one, one-to-many, many-to-many). There're approximately 10+ tables with a number of associative entities.
Suppose Table A and Table B has a many-to-many relationship, and hence there's a Table AB (associative entity). During insertion, it is necessary that after inserting the data into Table A and Table B, the inserted primary keys are mapped into the associative entity Table AB.
To be more specific, the followings codes show how my some of my database tables are related (all tables are created using Django models):
class Restaurant(models.Model):
...
promotion = models.ForeignKey(Promotion)
menus = models.ManyToManyField(Menu)
class Menu(models.Model):
...
mains = models.ManyToManyField(Main)
drinks = models.ManyToManyField(Drinks)
desserts = models.ManyToManyField(Dessert)
Note: There are 10+ tables that are interrelated in a similar way as shown above
The .csv file data is as shown below, and would require incorrect data checking. For example:
no, restaurant name, restaurant description, restaurant type, restaurant homepage, restaurant country, restaurant age, restaurant awards, restaurant promotions, restaurant drinks, restaurant desserts, restaurant mains
1, Papa Delicious, This restaurant is ......, Italian, www.papadelicious.com, United States, 3, , , [Orange Juice, Apple Juice, Coffee], [Ice-cream], [Linguine, Some Burger]
While inserting into the database, any repeated dishes or drinks should not be inserted as a new row, but instead update the keys in the respective associative entity(s).
These data will be inserted a couple of times a week, and should take no longer than 1 min per row, as error checking will have already taken some time, and the insertion may be carried out by multiple users at one point of the time. However, the hundreds of rows of data would be a one-off insertion (approximately 600++ rows of data to be inserted)
I'm looking for an efficient way to insert the data into all appropriate tables. I've looked into COPY, but I don't really know how it would work to COPY into multiple tables at once and mapping the foreign key respectively.
I've also tried to write my own script to insert the data in a .py file, but row by row insertion can be highly inefficient, especially when i'm inserting into multiple tables and mapping foreign keys in associative entities.
I guess writing a script to do the check and insertion is the best solution to this problem, but doing the checking and all the foreign keys mapping row by row can be highly inefficient when dealing with lots of data and 10+ entities.
Are there any better solutions out there?