What is the difference between UTF-32 and UCS-4 ? Isn't UTF-32 supposed to be a fixed-width encoding ?
-
1What is it about [the wikipedia page](https://en.wikipedia.org/wiki/UTF-32) that is unclear? If there are ambiguities on that page, it would be useful to discuss them. – Norman Gray May 12 '15 at 09:29
-
What 'hate'? The question is completely answered by the Wikipedia page, so it's not a useful addition to this site. If there's something on that page that isn't clear (and much about Unicode is perplexing), then a more detailed question – which says for example 'This explanation seems to imply X, but this other part implies Y, which contradicts; so what's the resolution?' – would be a useful and instructive question. A question which doesn't display research, or other attempts by the questioner to answer it themself, is ... less so. – Norman Gray May 12 '15 at 12:29
2 Answers
The Unicode Standard Version 8.0, Appendix C states:
UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in ISO 10646 (Universal Coded Character Set).
![](../../users/profiles/4695439.webp)
- 3,032
- 1
- 11
- 23
![](../../users/profiles/2559618.webp)
- 301
- 2
- 3
UTF-32
has started as a subset of UCS-4
. Now it is identical except that the UTF-32 standard has additional Unicode semantics. See details on wikipedia:
The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.
Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.
However, I am not exactly sure, what additional Unicode semantics
means. Maybe someone can provide a better answer.
![](../../users/profiles/759866.webp)
- 30,280
- 40
- 163
- 285
![](../../users/profiles/2441442.webp)
- 14,865
- 16
- 65
- 99
-
I personaly don't know @一二三. Maybe we need a better answer, which has more information about this. – Christian Gollhardt Apr 20 '16 at 02:48
-
1
-
4Sounds to me like UCS-4 = [0,0x7FFFFFFF] while UTF-32 = [0,0x10FFFF]. Both are represented as 32 bits, but UTF-32 further restricts the range of legal values. – Bill Fraser Oct 28 '16 at 23:13
-
1UTF contains additional properties such as right to left etc. https://en.wikipedia.org/wiki/Unicode_character_property. Otherwise the two are the same. – Ian Apr 23 '19 at 06:37
-
See http://www.unicode.org/faq/utf_bom.html#utf32-1: “UTF-32 is a subset of the encoding mechanism called UCS-4 in ISO 10646.” – hermannk Oct 04 '20 at 09:50