0

I need to insert strings (unicode words) from php array into mysql table but I need no duplicate values In php can't check duplicates because the single words come by exploding text from various sources and some words can be the same as previous inserted

For each word I need also to insert their ASCII version in the same column but only if unicode word is different to ASCII converted words

In mysql I have one table ´words´ with two columns:

´id´ PRIMARY and ´word´ UNIQUE (utf8mb4_general_ci)

$sql = "
    INSERT INTO ´words´ (´word´) VALUES ('$word')
    ON DUPLICATE KEY UPDATE
    SET ´word´ = '$word'
";

$query->execute();


//ex: if $word = "peter" and $word_ascii = "peter" do not insert $word_ascii
//but if $word = "julià" and $word_ascii = "julia" inert also $word_ascii
if($word != $word_ascii){
    $sql = "
        INSERT INTO ´words´ (´word´) VALUES ('$word_ascii')
        ON DUPLICATE KEY UPDATE
        SET ´word´ = '$word_ascii'
    ";  
    $query->execute();
} 

But in this mode when I try to insert $word_ascii = "julia" that overwrite "julià" which I have already inserted before (due to UNIQUE index and ON DUPLICATE KEY UPDATE)

If I quit the UNIQUE index then have a lot of duplicate entries.

I have tried:

INSERT IGNORE (with and without UNIQUE),

REPLACE ... but still the same problem.

I have tried also to adapt IF NOT EXIST from this old answer : How to 'insert if not exists' in MySQL?

INSERT INTO ´words´ (´word´) VALUES ('$word')
WHERE NOT EXISTS (SELECT * FROM ´words´ WHERE ´word´ = '$word' LIMIT 1);    

But unsuccessful. I got a syntax error.

Any help are welcome

Community
  • 1
  • 1
MTK
  • 2,549
  • 1
  • 23
  • 38
  • I am confused. What do you *want* to happen? – Gordon Linoff Apr 22 '17 at 22:02
  • Insert but no duplicate without using UNIQUE key because UNIQUE key with ON DUPLICATE KEY UPDATE overwrite julià with julia for example – MTK Apr 22 '17 at 22:05
  • 1
    . . `INSERT IGNORE` should do what you want. – Gordon Linoff Apr 22 '17 at 22:06
  • As I said above I have already used INSERT IGNORE but I'll try again now – MTK Apr 22 '17 at 22:08
  • Ihave tried again so in part you are Right. INSERT IGNORE with UNIQUE not insert duplicates but overwrite julià with julia ... So the problem still the same – MTK Apr 22 '17 at 22:15
  • I tend to believe that -- IF NOT EXISTS -- would be the solution but I do not know how to implement it. Do you have any idea how to adapt -- IF NOT EXIST -- here? – MTK Apr 22 '17 at 22:22
  • 1
    `IF NOT EXISTS` shows up in `CREATE` and `DROP` commands, not `INSERT`. – Rick James Apr 23 '17 at 18:44

1 Answers1

1

With utf8mb4_general_ci, "julià" and "julia" are considered equal.

One possible solution is to change the collation for word to be utf8_bin. Then INSERT IGNORE will work as intended, and you will get two rows, one with "julià", one with "julia".

Do you want to have "julià" mapped to "julia"?

But there may be more to your question -- will you be including, say, Chinese names in both Chinese and how they would be spelled with English letters? Collation is not sufficient.

Rick James
  • 106,233
  • 9
  • 103
  • 171
  • For the moment i have provisionally solved with creating two separate tables one for unicode and the other for ASCII. (I don't know if that affect query performances) I'ill try tomorrow your proposal ( There is 04:00 AM :D ) Thanks. About Chinese and other chars that can not be converted to ASCII I check that in PHP and if have ??? chars I insert only the original word – MTK Apr 23 '17 at 02:00
  • @MTK - 3 responses to your comment: * In one table, separate columns can use separate charsets. * See "question marks" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored for Chinese; and use utf8mb4. * "Unicode" is the wrong term. – Rick James Apr 23 '17 at 15:52