how to return unicode 8 byte value from string character

Question

I am trying to return the 2 byte WORD Hex value of a string character which is not typically English. Basically the Unicode representation. Using vb.net

Ex:

FF5F = ((

FF06 = &

These are represented in unicode standard 6.2. I do not have the ability to display some of the foreign language characters displayed in this set.

So would like for my string character to be converted to this 2 byte value. I haven't been able to find a function in .net to do this.

The code is currently nothing more than a for loop cycling through the string characters, so no sample progress.

I have tried the AscW and ChrW functions but they do not return the 2byte value. ASCII does not seem to be reliable above 255.

If necessary I could isolate the possible languages being tested so that only one language is considered through the comparisons, although an English character is always possible.

Any guidance would be appreciated.

These are full-width characters, common in East Asian typography. A font like MS Gothic can display them. It is very unclear what you try to do with them, using String.ToCharArray() or just indexing the string is a simple way to get the value. — Hans Passant, Feb 20 '13 at 17:08
Not all Unicode characters fit into 2 bytes. Either you are talking about UTF-16 code units (.NET: System.Char) or your assumption about size is wrong. — Sebastian Negraszus, Feb 21 '13 at 10:27
I am referring to the references at this link to half and full width characters, specifically katakana http://www.unicode.org/charts/PDF/UFF00.pdf all are 2 bytes in this spec. — htm11h, Feb 21 '13 at 13:23

score 0 · Answer 1 · edited May 23 '17 at 12:18

0

I think you could convert your string to a byte array, which, would look something like this in C#:

static byte[] GetBytes(string str)
{
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;
}

From that you can just grab to two first bytes from the array, and there you go, you have them.

If you want to show them on a screen, I guess you should probably convert them to hex or some such displayable format.

I've stolen this from the question here.

edited May 23 '17 at 12:18

Community

1
1

answered Feb 20 '13 at 16:40

Tony The Lion

57,181
57
223
390

Thanks, i'll test this now. – htm11h Feb 20 '13 at 16:42
Well, the code executes ok, but I am not getting the expected values for known characters. Even after converting byte(s) to Hex. – htm11h Feb 20 '13 at 16:53
It appears that this function only returns the base 255 chracters. It is not recognizing unicode values above this. – htm11h Feb 20 '13 at 17:06
There is classes that deal with Unicode in .NET, one of them being the `Encoding` class. You may find [this article](http://msdn.microsoft.com/en-us/library/zs0350fy(v=vs.71).aspx) to be of some interest. – Tony The Lion Feb 20 '13 at 17:40

score 0 · Accepted Answer · answered Feb 21 '13 at 13:39

A collegaue assisted in developing a solution. Each character of the string is converted to character array, and then to an unsigned integer, which is then converted to Hex.

lt = myString
Dim sChars() As Char = lt.ToCharArray

For Each c As Char In sChars
     Dim intVal As UInteger = AscW(c)
     Debug.Print(c & "=" & Hex(intVal))
Next

Note the AscW function... AscW returns the Unicode code point for the input character. This can be 0 through 65535. The returned value is independent of the culture and code page settings for the current thread. http://msdn.microsoft.com/en-us/library/zew1e4wc(v=vs.90).aspx

I then compare the resulting Hex to the spec for reporting.

how to return unicode 8 byte value from string character

2 Answers2