0

I am trying to return the 2 byte WORD Hex value of a string character which is not typically English. Basically the Unicode representation. Using vb.net

Ex:

FF5F = ((

FF06 = &

These are represented in unicode standard 6.2. I do not have the ability to display some of the foreign language characters displayed in this set.

So would like for my string character to be converted to this 2 byte value. I haven't been able to find a function in .net to do this.

The code is currently nothing more than a for loop cycling through the string characters, so no sample progress.

I have tried the AscW and ChrW functions but they do not return the 2byte value. ASCII does not seem to be reliable above 255.

If necessary I could isolate the possible languages being tested so that only one language is considered through the comparisons, although an English character is always possible.

Any guidance would be appreciated.

BenMorel
  • 30,280
  • 40
  • 163
  • 285
htm11h
  • 1,655
  • 8
  • 45
  • 96
  • These are full-width characters, common in East Asian typography. A font like MS Gothic can display them. It is very unclear what you try to do with them, using String.ToCharArray() or just indexing the string is a simple way to get the value. – Hans Passant Feb 20 '13 at 17:08
  • Not all Unicode characters fit into 2 bytes. Either you are talking about UTF-16 code units (.NET: System.Char) or your assumption about size is wrong. – Sebastian Negraszus Feb 21 '13 at 10:27
  • I am referring to the references at this link to half and full width characters, specifically katakana http://www.unicode.org/charts/PDF/UFF00.pdf all are 2 bytes in this spec. – htm11h Feb 21 '13 at 13:23

2 Answers2

0

I think you could convert your string to a byte array, which, would look something like this in C#:

static byte[] GetBytes(string str)
{
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;
}

From that you can just grab to two first bytes from the array, and there you go, you have them.

If you want to show them on a screen, I guess you should probably convert them to hex or some such displayable format.

I've stolen this from the question here.

Community
  • 1
  • 1
Tony The Lion
  • 57,181
  • 57
  • 223
  • 390
  • Thanks, i'll test this now. – htm11h Feb 20 '13 at 16:42
  • Well, the code executes ok, but I am not getting the expected values for known characters. Even after converting byte(s) to Hex. – htm11h Feb 20 '13 at 16:53
  • It appears that this function only returns the base 255 chracters. It is not recognizing unicode values above this. – htm11h Feb 20 '13 at 17:06
  • There is classes that deal with Unicode in .NET, one of them being the `Encoding` class. You may find [this article](http://msdn.microsoft.com/en-us/library/zs0350fy(v=vs.71).aspx) to be of some interest. – Tony The Lion Feb 20 '13 at 17:40
0

A collegaue assisted in developing a solution. Each character of the string is converted to character array, and then to an unsigned integer, which is then converted to Hex.

lt = myString
Dim sChars() As Char = lt.ToCharArray

For Each c As Char In sChars
     Dim intVal As UInteger = AscW(c)
     Debug.Print(c & "=" & Hex(intVal))
Next

Note the AscW function... AscW returns the Unicode code point for the input character. This can be 0 through 65535. The returned value is independent of the culture and code page settings for the current thread. http://msdn.microsoft.com/en-us/library/zew1e4wc(v=vs.90).aspx

I then compare the resulting Hex to the spec for reporting.

htm11h
  • 1,655
  • 8
  • 45
  • 96