711

I've seen weirdly formatted text called Zalgo like below written on various forums. It's kind of annoying to look at, but it really bothers me because it undermines my notion of what a character is supposed to be. My understanding is that a character is supposed to move horizontally across a line and stay within a certain "container". Obviously the Zalgo text is moving vertically and doesn't seem to be restricted to any space.

Is this a bug/flaw/exploit/hack in Unicode? Are these individual characters with weird properties? "What" is happening here?


H̡̫̤̤̣͉̤ͭ̓̓̇͗̎̀ơ̯̗̱̘̮͒̄̀̈ͤ̀͡w͓̲͙͖̥͉̹͋ͬ̊ͦ̂̀̚ ͎͉͖̌ͯͅͅd̳̘̿̃̔̏ͣ͂̉̕ŏ̖̙͋ͤ̊͗̓͟͜e͈͕̯̮̙̣͓͌ͭ̍̐̃͒s͙͔̺͇̗̱̿̊̇͞ ̸̤͓̞̱̫ͩͩ͑̋̀ͮͥͦ̊Z̆̊͊҉҉̠̱̦̩͕ą̟̹͈̺̹̋̅ͯĺ̡̘̹̻̩̩͋͘g̪͚͗ͬ͒o̢̖͇̬͍͇͓̔͋͊̓ ̢͈͙͂ͣ̏̿͐͂ͯ͠t̛͓̖̻̲ͤ̈ͣ͝e͋̄ͬ̽͜҉͚̭͇ͅx͎̬̠͇̌ͤ̓̂̓͐͐́͋͡ț̗̹̝̄̌̀ͧͩ̕͢ ̮̗̩̳̱̾w͎̭̤͍͇̰̄͗ͭ̃͗ͮ̐o̢̯̻̰̼͕̾ͣͬ̽̔̍͟ͅr̢̪͙͍̠̀ͅǩ̵̶̗̮̮ͪ́?̙͉̥̬͙̟̮͕ͤ̌͗ͩ̕͡


MD XF
  • 7,062
  • 7
  • 34
  • 64
Mike
  • 54,052
  • 71
  • 166
  • 213

2 Answers2

443

The text uses combining characters, also known as combining marks. See section 2.11 of Combining Characters in the Unicode Standard (PDF).

In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character

So you can easily construct a character sequence, consisting of a base character and “combining above” marks, of any length, to reach any desired visual height, assuming that the rendering software conforms to the Unicode rendering model. Such a sequence has no meaning of course, and even a monkey could produce it (e.g., given a keyboard with suitable driver).

And you can mix “combining above” and “combining below” marks.

The sample text in the question starts with:

Matas Vaitkevicius
  • 49,230
  • 25
  • 212
  • 228
Jukka K. Korpela
  • 178,198
  • 33
  • 241
  • 350
  • 38
    Unicode can do this because it deliberatedly conforms to nothing but the "real world usage of characters" - software is then expected to conform to Unicode. And this is why we have e.g., `U+1F4A9`. – Camilo Martin Sep 22 '14 at 01:00
  • 2
    Just to add to this, here's a list of combining characters used above below, or through the text to generate "Zalgo text": http://www.zalgotextgenerator.com/unicode – VKK Mar 22 '16 at 15:22
280

Zalgo text works because of combining characters. These are special characters that allow to modify character that comes before.

enter image description here

OR

y + ̆ = y̆ which actually is

y + ̆ = y̆

Since you can stack them one atop the other you can produce the following:


y̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆

which actually is:

y̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆

The same goes for putting stuff underneath:


y̰̰̰̰̰̰̰̰̰̰̰̰̰̰̰̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆



that in fact is:

y̰̰̰̰̰̰̰̰̰̰̰̰̰̰̰̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆

In Unicode, the main block of combining diacritics for European languages and the International Phonetic Alphabet is U+0300–U+036F.

More about it here

To produce a list of combining diacritical marks you can use the following script (since links keep on dying)

for(var i=768; i<879; i++){console.log(new DOMParser().parseFromString("&#"+i+";", "text/html").documentElement.textContent +"  "+"&#"+i+";");}

Also check em out



Mͣͭͣ̾ Vͣͥͭ͛ͤͮͥͨͥͧ̾

Matas Vaitkevicius
  • 49,230
  • 25
  • 212
  • 228
  • 2
    how would you type that? – Aequitas Oct 14 '16 at 02:54
  • 7
    @Aequitas If you are asking about `ALT` codes then you cannot do that you would simply paste `y̆̆` where it gets into 'pure' html and browser would do it's magic... – Matas Vaitkevicius Oct 14 '16 at 08:03
  • 2
    @barbsan Hi, thanks for letting me know, I have replaced it with a script that generates them. – Matas Vaitkevicius Nov 16 '18 at 05:24
  • I wonder why did you choose this particular example of Y with a tildae. It actually has some meaning in Russian, not sure if you are familiar with that. – SergeyA Jun 11 '19 at 18:42
  • @SergeyA I think he uses this example because it is the very same example the linked wikipedia page (https://en.wikipedia.org/wiki/Combining_character) is using. – Mischa Jun 12 '19 at 10:50
  • @Mischa They don't use underscore tilde (`y̰` y̰ ) in wiki example. I am fluent in Russian but have no clue about it. What does it mean? – Matas Vaitkevicius Jun 14 '19 at 08:29
  • @MatasVaitkevicius Oh, my bad. I thought he meant the first example (`y̆`) – Mischa Jun 15 '19 at 14:06
  • @SergeyA what does underscore tilde (`y̰` y̰ ) mean? – Matas Vaitkevicius Jun 21 '19 at 19:00
  • @MatasVaitkevicius it's hard to explain to someone who is not native Russian without violating SO's rule for profanity and obscenity. Basically, there is a (most) obscene word in Russian which is 3 characters. If they were all to be drawn one over another you'd come up with a glyph which would very much resemble the one posted. Any native Russian speaker will immediately recognize this writing as obfuscated obscene word (in funny way, not insulting one). I think, this particular glyph was invented by Russian designer some decades back. – SergeyA Jun 21 '19 at 19:26
  • @MatasVaitkevicius note the tilde is over the Y character, not under it. It makes difference. – SergeyA Jun 21 '19 at 19:28
  • @SergeyA Best I could come up with is this y͓ͥ there is no Cyrillic `И` over nor under, I am pretty sure if you would post the correct one (for educational purposes, we are all adults and one might want to filter it on the web) no one will mind... I am fluent in Russian and can talk by only using profanities (like average Russian ;) ) – Matas Vaitkevicius Jun 22 '19 at 08:19
  • @MatasVaitkevicius I will spell it in reverse with spaces and god help me :) The word I am referring to is Й У Х. Draw all those letter one over another and you'll see something very similar to the thing. – SergeyA Jun 24 '19 at 14:46
  • It's just the nickname for Richard. – mplungjan Sep 15 '19 at 13:17