-1

I have a Client/Server architecture where messages in text-format are exchanged.

For example:

12  2013/11/11  abcd  5
^     ^          ^    ^
int  date      text  int

Everything works fine with "normal" text. Now this is a chinese project, so they also want so send chinese symbols. Encoding GB18030 or GB2312.

I read the data this way:

char[] dataIn = binaryReader.ReadChars(length);

then i create a new string from the char array and convert it to the right data type (int, float, string etc.).

How can I change/enable chinese encoding, or convert the string values to chinese? And what would be a good & easy way to test this. Thanks.

I tried using something like this

string stringData = new string(dataIn).Trim();
byte[] data = Encoding.Unicode.GetBytes(stringData);
stringData = Encoding.GetEncoding("GB18030").GetString(data);

Without success.

Also I need to save some text values to MS SQL Server 2008, is this possible - do I need to configurate anything special?

I also tried this example with storing to the database and printing to the console, but I just get ????????

string chinese = "123东北特钢大连新基地testtest"; 
byte[] utfBytes = Encoding.Unicode.GetBytes(chinese); 
byte[] chineseBytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding("GB18030"), utfBytes); 
string msg = Encoding.GetEncoding("GB18030").GetString(chineseBytes);

Edit The problem was with the INSERT queries, which I send to the database. I fixed it with using N' before the string.

sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);

Also the column dataType has to be nvarchar instead of varchar.

HectorLector
  • 1,646
  • 21
  • 31
  • Have you taken a look at the [`Encoding` class](http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx)? – Oded Jun 03 '13 at 11:54
  • @HectorLector - Just read the data using the required Encoding. This normally would indicate you storing this information also i the message. – Security Hound Jun 03 '13 at 11:59
  • 1
    The `BinaryReader` class offers constructors where you supply the `Encoding`. Did you try something like `new BinaryReader(inputStream, Encoding.GetEncoding("GB18030"))`? – Jeppe Stig Nielsen Jun 03 '13 at 12:06
  • Don't test this with the standard console. Simple .NET strings (without surrogate pairs or anything) like `string str1 = "123东北特钢大连新基地testtest";` or `string str2 = DateTime.Today.ToString("D", new CultureInfo("zh-CN"));` won't print well with `Console.WriteLine`. You can see the value of the string during debugging, though. Don't mix UTF-16 (or `Encoding.Unicode`) with GB 18030. When you "read" from their source, set the `BinaryReader` to the correct encoding as suggested in my latest comment. The rest of the time, do nothing special, just trust the .NET Framework and the SQL Server. – Jeppe Stig Nielsen Jun 03 '13 at 12:29
  • Thanks for the input. Do I need to change something in th MS SQL Server. Right now the Collation is set to SQL_Latin1_General_CP1_CI_AS. – HectorLector Jun 03 '13 at 12:36
  • @HectorLector That depends on wheteher you do `ORDER BY` on columns with Chinese content, or similar. If you do, the collation could be important. If your Chinese string contains characters outside the [BMP](https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane), you are going to have problems with the SQL Server, I think. But it probably does not. You can verify with `stringData.Any(char.IsSurrogate)` (uses LINQ `Any` method). – Jeppe Stig Nielsen Jun 03 '13 at 12:45
  • 1
    The database columns in question must have a type with an initial `n`, for example use `nvarchar` instead of `varchar`, or `ntext` instead of `text`. When you compare to a constant string, use `N'123东北'` with capital `N` before the ticks, _not_ just `'123东北'`. – Jeppe Stig Nielsen Jun 03 '13 at 15:16
  • I have an SQLCommand with string.Format( Insert ... '{0}', chineseString); Do I also need the "N" here? – HectorLector Jun 04 '13 at 02:58
  • It was the problem with the Insert statement and the missing N'. Put you comment in an answer and I will accept it ;-) – HectorLector Jun 04 '13 at 03:16

1 Answers1

1

This anser is "promoted" (by request from the Original Poster) from comments by myself.

In the .NET Framework, strings are already Unicode strings.

(Don't test Unicode strings by writing to the console, though, since the terminal window and console typically won't display them correctly. However, since .NET version 4.5 there is some support for this.)

The thing to be aware of is the Encoding when you get text from an outside source. In this case, the constructor of BinaryReader offers an overload that takes in an Encoding:

using (var binaryReader = new BinaryReader(yourStream, Encoding.GetEncoding("GB18030")))
    ...

On the SQL Server, be sure that any column that needs to hold Chinese strings is of type nvarchar (or nchar), not just varchar (char). Otherwise, depending on the collation, the column may not be able to hold general Unicode characters (it may be represented internally by some 8-bit Microsoft code page).

Whenever you give an nchar literal in SQL, use the format N'my text', not just 'my text', to make sure the literal is interpreted as an nchar rather than just char. For example N'Erdős' is distinct from N'Erdos' while, in many collations, 'Erdős' and 'Erdos' might be (projected onto) the same value in the underlying code page.

Similarly N'东北特钢大连新基地' will work, while '东北特钢大连新基地' might result in a lot of question marks. From the update of your quetion:

sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);
                                                                         ↑

(This is prone to SQL injection, of course.)

The default collation of your column will be that of your database (SQL_Latin1_General_CP1_CI_AS from your comment). Unless you ORDER BY that column, or similar, that will probably be fine. If you do order by this column, consider using some Chinese language collation for the column (or for the entire database).

Jeppe Stig Nielsen
  • 54,796
  • 9
  • 96
  • 154