5

is there any simple/lightweight solution to change at least some of non-ASCII symbols to respective ASCII analogs? For example this string

abc-åäö.txt

should be changed to

abc-aao.txt

A bit of background: Zip-tools do not reliably support UTF-8, hence the need to downgrade. AFAICR Google "download attachments as single zip file" feature replaces any non-ascii symbols with the '_' character.

PS: the code might as well be in some other language, if it's more or less understandable I'll port that to Java. PPS: my first question so far, so please don't minus me below the ground okay?

Anton K
  • 3,796
  • 4
  • 23
  • 37
  • 2
    possible duplicate of [Converting Symbols, Accent Letters to English Alphabet.](http://stackoverflow.com/questions/1008802/converting-symbols-accent-letters-to-english-alphabet) – McDowell Jul 28 '10 at 10:21
  • So how would we proceed, close this as a duplicate? Questions apparently are quite close but I still was unable to see that one before posting mine... – Anton K Jul 28 '10 at 10:32
  • possible duplicate of [Replace national characters with ASCII equivalent.](http://stackoverflow.com/questions/3194516/replace-national-characters-with-ascii-equivalent) – dan04 Jul 28 '10 at 13:33
  • look for `Unihandecode` – n611x007 Jun 04 '15 at 16:59

5 Answers5

4

Have a look at java.text.Normalizer. It can help you with transforming equivalent characters: http://en.wikipedia.org/wiki/Unicode_equivalence

relet
  • 6,204
  • 1
  • 30
  • 41
1

Maybe this would do?

Krumelur
  • 27,311
  • 6
  • 71
  • 108
  • thanks for the reference, but I don't see the actual code there, apparently this is either already a part of JRE (that java.text.Normalizer or something similar) or not a lightweight solution... – Anton K Jul 28 '10 at 10:39
1

Looks like the problem is solved here -

[solution][howto] Convert special characters to normal chars (é to e) http://www.ramonfincken.com/permalink/topic192.html

d-live
  • 7,632
  • 3
  • 18
  • 16
0

Okay, found something more or less working in this question: PHP: Replace umlauts with closest 7-bit ASCII aequivalent in an UTF-8 string

Community
  • 1
  • 1
Anton K
  • 3,796
  • 4
  • 23
  • 37
0

If you would consider using python, there is a pretty good python package called unidecode, which can get the ASCII transliterations of Unicode text.

Iching Chang
  • 458
  • 1
  • 5
  • 16