Questions tagged [character-set]

A character set maps a set of characters to specific numeric values, e.g. ASCII, UTF-8 and ISO-8859-1.

A character set maps a set of characters to specific numeric values.

Modern computer languages, editors and tools facilitate encoding and decoding of data between internal representations of data and specific character sets. Examples include ASCII, UTF-8 and ISO-8859-1.

Consideration should be given to using the appropriate character set for transmission and persistence of data, particularly text that can contain special characters (such as European languages like French or German) or be in a completely different script (such as Japanese) - see internationalisation (also referred to as i18n).

92 questions
1171
votes
8 answers

What's the difference between utf8_general_ci and utf8_unicode_ci?

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?
KahWee Teng
  • 12,350
  • 3
  • 19
  • 21
557
votes
21 answers

Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools…
Antti Kissaniemi
  • 17,999
  • 13
  • 51
  • 47
324
votes
4 answers

What does character set and collation mean exactly?

I can read the MySQL documentation and it's pretty clear. But, how does one decide which character set to use? On what data does collation have an effect? I'm asking for an explanation of the two and how to choose them.
Sander Versluys
  • 67,197
  • 23
  • 79
  • 89
30
votes
2 answers

About the "Character set" option in Visual Studio

I have an inquiry about the "Character set" option in Visual Studio. The Character Set options are: Not Set Use Unicode Character Set Use Multi-Byte Character Set I want to know what the difference between three options in Character…
Lion King
  • 28,712
  • 21
  • 69
  • 128
16
votes
2 answers

SQL Server: set character set (not collation)

How does one set the default character set for fields when creating tables in SQL Server? In MySQL one does this: CREATE TABLE tableName ( name VARCHAR(128) CHARACTER SET utf8 ) DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Note…
dotancohen
  • 26,432
  • 30
  • 122
  • 179
7
votes
2 answers

Does using ASCII/Latin Charset speed up the database?

It would seem that using the ASCII charset for most fields and then specify utf8 only for the fields that need it would reduce the amount of I/O the database must perform by 100%. Anyone know if this is true? Update: The above was not really my…
mbalsam
  • 541
  • 1
  • 3
  • 14
4
votes
2 answers

Can Excel Sort Differently Than Its Default U.S. Character Set?

My question is basically the opposite of THIS ONE (which had a database-based solution I can't use here). I use SAP, which sorts characters this way: 0-9, A-Z, _ but I'm downloading data into Excel and manipulating ranges dependent on correct SAP…
wiigame
  • 143
  • 6
4
votes
3 answers

vb.net character set

According to MSDN vb.net uses this extended character set. In my experience it actually uses this: What am I missing? Why does it say it uses the one and uses the other? Am I doing something wrong? Is there some sort of conversion tool to the…
Connor Albright
  • 693
  • 4
  • 13
  • 28
4
votes
2 answers

Determining ISO-8859-1 vs US-ASCII charset

I am trying to determine whether to use PrintWriter pw = new PrintWriter(outputFilename, "ISO-8859-1"); or PrintWriter pw = new PrintWriter(outputFilename, "US-ASCII"); I was reading All about character sets to determine the character set of an…
vikingsteve
  • 34,284
  • 19
  • 101
  • 142
3
votes
1 answer

Can individual tags override the Character Set in the Specific Character Set (0008,0005)

If I create a DICOM object with a basic single byte Specific Character Set like (0008,0005) = ISO_IR 100, can one of the tags use a different 2-byte Character set? For example can Patient Name (0010,0010) be encoded in Simplified Chinese (ISO 2022…
3
votes
2 answers

Why is there a need to add a '0' to indexes in order to access array values?

I am confused with this line: sum += a[s[i] - '0']; To give some context, this is the rest of the code: #include using namespace std; int main() { int a[5]; for (int i = 1; i <= 4; i++) cin >> a[i]; string s; …
Zachary
  • 31
  • 1
3
votes
1 answer

Checking CharacterSet for single UnicodeScalar yields strange behaviour

While working with CharacterSet I've come across an interesting problem. From what I have gathered so far CharacterSet is based around UnicodeScalar; you can initialise it with scalars and check if a scalar is contained within the set. Querying the…
Michael Waterfall
  • 19,942
  • 26
  • 108
  • 161
2
votes
2 answers

Parsing of CSV file using Node/Express spits out weird \x001 codes

I'm using Node and Express to fetch a .CSV file from a URL that I want to parse. The process of downloading it works just fine. But when I use csv-parser to parse the file the output in the console looks like this: Just tonnes of lines of weird…
2
votes
1 answer

Is there a way to list all categories in perluniprops?

perluniprops lists the Unicode properties of the version of Unicode it supports. For Perl 5.32.1, that's Unicode 13.0.0. You can obtain a list of the characters that match a category using Unicode::Tussle's unichars. unichars '\p{Close_Punctuation}'…
alvas
  • 94,813
  • 90
  • 365
  • 641
2
votes
2 answers

Getting Arabic characters as ??? in PHP from JDE

I am trying to fetch our Arabic values from JDE Database using the following connection string: $dsn = "Driver={SQL Server};Server=10.10.10.27;Database=JDE;charset=utf8"; $username = "username"; $password = "password"; $string =…
1
2 3 4 5 6 7