5

Under Delphi 2010 (and probably under D2009 also) the default string type is UnicodeString.

However if we declare...

const
 s  :string = 'Test';
 ss :string[4] = 'Test';

... then the first string s if declared as UnicodeString, but the second one ss is declared as AnsiString!

We can check this: SizeOf(s[1]); will return size 2 and SizeOf(ss[1]); will return size 1.

If I declare...

var
  s  :string;
  ss :string[4];

... than I want that ss is also UnicodeString type.

  1. How can I tell to Delphi 2010 that both strings should be UnicodeString type?
  2. How else can I declare that ss holds four WideChars? The compiler will not accept the type declarations WideString[4] or UnicodeString[4].
  3. What is the purpose of two different compiler declarations for the same type name: string?
Rob Kennedy
  • 156,531
  • 20
  • 258
  • 446
GJ.
  • 10,234
  • 2
  • 39
  • 58
  • 10
    You should be aware that the default string type is **not** `WideString`; it's `UnicodeString`. They both use wide chars, but the semantics are very different. For one thing, `WideString` is not reference-counted, but `UnicodeString` is. – Mason Wheeler Jan 25 '11 at 14:05
  • 2
    @Mason This is a good point. As an aside I find the term *semantics* rather confusing. Semantics is the study of *meaning*. But what's really different about these two types is their *implementation*. The key difference is that, as well as reference counting, they use copy-on-write. This gives the types different performance characteristics, but the same *meaning* when viewed from the outside. I appreciate fully that the world of computer programmers uses the term *semantics* in this particular way, but it just always confuses the heck out of me! – David Heffernan Jan 25 '11 at 14:11
  • 2
    @Mason, since GJ's faulty assumption about the default type doesn't really change the point of the question, I hope everyone can agree that my editing it to say UnicodeString doesn't affect the validity of any answers. The question is about how to declare fixed-length Unicode strings, whatever the actual type might be. – Rob Kennedy Jan 25 '11 at 15:45
  • 2
    possible duplicate of [Delphi Unicode String Type Stored Directly at its Address (or "Unicode ShortString")](http://stackoverflow.com/questions/2806537/delphi-unicode-string-type-stored-directly-at-its-address-or-unicode-shortstrin) – Andreas Rejbrand Jan 25 '11 at 16:58

2 Answers2

12

The answer to this lies in the fact that string[n], which is a ShortString, is now considered a legacy type. Embarcadero took the decision not to convert ShortString to have support for Unicode. Since the long string was introduced, if my memory serves correctly, in Delphi 2, that seems a reasonable decision to me.

If you really want fixed length arrays of WideChar then you can simply declare array [1..n] of char.

David Heffernan
  • 572,264
  • 40
  • 974
  • 1,389
4
  1. You can't, using string[4] as the type. Declaring it that way automatically makes it a ShortString.

  2. Declare it as an array of Char instead, which will make it an array of 4 WideChars.

  3. Because a string[4] makes it a string containing 4 characters. However, since WideChars can be more than one byte in size, this would be a) wrong, and b) confusing. ShortStrings are still around for backward compatibility, and are automatically AnsiStrings because they consist of [x] one byte chars.

Ken White
  • 117,855
  • 13
  • 197
  • 405
  • 1
    You told: **WideChars can be more than one byte in size**, yes but the size of WideChar is exactly 2 bytes and not less or more! – GJ. Jan 25 '11 at 14:07
  • What Ken meant is that in Unicode a code point can consist of more then one code unit. So a "char" could be 4 bytes. It's the meaning of the word "character" thats a bit confusing here, what does "character" mean? A code point or a code unit? In Delphi its a "code unit" (so 8-Bit for AnsiChar and 16-Bit for WideChar). – Jens Mühlenhoff Jan 25 '11 at 16:27
  • 1
    @Ken: An Ansi glyph can consist of more than one byte (think of multi-byte encoding). Windows even considers UTF-8 as an Ansi encoding as does Delphi 2009+. – Jens Mühlenhoff Jan 25 '11 at 16:38
  • @Jens: But ShortString wasn't actually ANSI, but ASCII (TP/BP days), IIRC. That's why it was a single-byte signed char. Or am I remembering wrong (it's possible - TP was eons ago, wasn't it? )? – Ken White Jan 25 '11 at 18:25
  • Sure, DOS used OEM character sets for values beyond 127 (and DOS really only had 1 byte character sets IIRC). Char was always defined as #0..#255 until Delphi.NET and Delphi 2009, so I think it wasn't signed in Turbo Pascal. – Jens Mühlenhoff Jan 25 '11 at 19:35
  • Yes. They called it extended ascii at the time. – Marco van de Voort Jan 25 '11 at 22:30
  • @Jens/@Marco: Thanks for confirming that my memory still works once in a while. :-) – Ken White Jan 26 '11 at 13:52