What is the difference between UTF-32 and UCS-4?

Question

What is the difference between UTF-32 and UCS-4 ? Isn't UTF-32 supposed to be a fixed-width encoding ?

What is it about [the wikipedia page](https://en.wikipedia.org/wiki/UTF-32) that is unclear? If there are ambiguities on that page, it would be useful to discuss them. — Norman Gray, May 12 '15 at 09:29
What 'hate'? The question is completely answered by the Wikipedia page, so it's not a useful addition to this site. If there's something on that page that isn't clear (and much about Unicode is perplexing), then a more detailed question – which says for example 'This explanation seems to imply X, but this other part implies Y, which contradicts; so what's the resolution?' – would be a useful and instructive question. A question which doesn't display research, or other attempts by the questioner to answer it themself, is ... less so. — Norman Gray, May 12 '15 at 12:29

score 20 · Answer 1 · edited Feb 24 '20 at 19:04

20

The Unicode Standard Version 8.0, Appendix C states:

UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in ISO 10646 (Universal Coded Character Set).

edited Feb 24 '20 at 19:04

Jim U

3,032
1
11
23

answered Jun 09 '16 at 08:02

Jonathan Maddox

301
2
3

score 15 · Accepted Answer · edited Aug 06 '18 at 16:52

15

UTF-32 has started as a subset of UCS-4. Now it is identical except that the UTF-32 standard has additional Unicode semantics. See details on wikipedia:

The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.

However, I am not exactly sure, what additional Unicode semantics means. Maybe someone can provide a better answer.

edited Aug 06 '18 at 16:52

BenMorel

30,280
40
163
285

answered May 12 '15 at 09:27

Christian Gollhardt

14,865
16
65
99

I personaly don't know @一二三. Maybe we need a better answer, which has more information about this. – Christian Gollhardt Apr 20 '16 at 02:48
1

The Wikipedia article says "[clarification needed]". – Keith Thompson Apr 20 '16 at 02:54
4

Sounds to me like UCS-4 = [0,0x7FFFFFFF] while UTF-32 = [0,0x10FFFF]. Both are represented as 32 bits, but UTF-32 further restricts the range of legal values. – Bill Fraser Oct 28 '16 at 23:13
1

UTF contains additional properties such as right to left etc. https://en.wikipedia.org/wiki/Unicode_character_property. Otherwise the two are the same. – Ian Apr 23 '19 at 06:37
See http://www.unicode.org/faq/utf_bom.html#utf32-1: “UTF-32 is a subset of the encoding mechanism called UCS-4 in ISO 10646.” – hermannk Oct 04 '20 at 09:50

What is the difference between UTF-32 and UCS-4?

2 Answers2

Linked