I have a table with one column whose encoding is cp1252 and collation is latin_swedish_ci, and I need to change it to utf8_general_ci.
I'd like to check if I'm not going to end up with weird characters in one of the rows due to the conversion.
This column stores domain names, and I'm unsure whether or not I have swedish characters in one of the rows.
I've been researching this but I haven't been able to find a way to check for data's integrity before changing the collection.
My best guess so far is to write a script to check if there's a column that doesn't contain any of the english alphabet characters, but I'm pretty sure that there's a better way to do this.
Any help would be great!
UPDATE
I've found multiple rows with garbage like this:
ÜZìp;ìê+ØeÞ{/e¼ðP;
Is there a way to ged rid of that junk without examining row per row?