Normalize data before loading to database or use database?

Question

I have some data which I want to add to an existing mysql database. The new data may have entries, which are already saved on DB. Since some of my columns are unique, I get, as expected, an ER_DUP_ENTRY error.

Bulk Insert

Let's say I want to use following statement to save "A", "B" and "C" in a column names of table mytable and "A" is already saved there.

insert into mytable (names) values ("A"), ("B"), ("C");

Is there a way to directly use bulk insert to save "B" and "C" while ignoring "A"? Or do I have to build an insert statement for every new row? This leads to another question:

Normalize Data

Should I assure not to upload duplicate entries before the actual insert statement? In my case I would need to select the data from database, eliminate duplicates and then perform the above seen insert. Or is that a task which is supposed to be done by a database?

Most often, I believe this is handled by the database server. Take a look at this stack question: https://stackoverflow.com/q/3164505/3148109 — Zach King, Aug 09 '18 at 18:21
If I get it right, I can't use bulk insert for that task? Because the way it is shown there, it checks for every entry if it already exists. — Neskews, Aug 09 '18 at 18:26

score 2 · Accepted Answer · answered Aug 09 '18 at 18:39

If you have UNIQUE constraints that are blocking import, you have a few ways you can work around that:

INSERT IGNORE INTO mytable ...

If any individual rows violate a UNIQUE constraint, they are skipped. Other rows are inserted.

REPLACE INTO mytable ...

If any rows violate a UNIQUE constraint, DELETE the existing row, then INSERT the new row. Keep in mind side-effects of doing this, like if you have foreign keys that cascade on delete referencing the deleted row. Or if the INSERT generates a new auto-increment id.

INSERT INTO mytable ... ON DUPLICATE KEY UPDATE ...

More flexibility. This does not delete the original row, but allows you to set new values for any columns you choose on a case by case basis. See also my answer to "INSERT IGNORE" vs "INSERT ... ON DUPLICATE KEY UPDATE"

If you want to use bulk-loading with mysqlimport or the SQL statement equivalent LOAD DATA INFILE, there are options that match the INSERT IGNORE or REPLACE solutions, but not the INSERT...ON DUPLICATE KEY UPDATE solution.

Read docs for more information:

Rick James · Answer 2 · 2018-08-23T16:49:24.677

In some situations, I like to do this:

LOAD DATA into a temp table
Clean up the data
Normalize as needed. (2 SQLs per column that needs normalizing -- details)
Augment Summary table(s) (INSERT .. ON DUPLICATE KEY .. SELECT x, y, count(*), sum(z), .. GROUP BY x,y)
Copy clean data from temp table to real table(s) ("Fact" table). (INSERT [IGNORE] .. SELECT [DISTINCT] .. or IODKU with SELECT.)

Normalize data before loading to database or use database?

Bulk Insert

Normalize Data

2 Answers2