Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

23459 questions

votes

1 answer

Java REGEX code to validate Indian language characters not working?

Why the following code not working(resulting false) with Indian languages? System.out.println(Charset.forName("UTF-8").encode("అనువాద") …

asked May 02 '13 at 10:09

Suren Raju

2,760
4
20
44

votes

2 answers

Unicode Encoding and decoding issues in QRCode

I am trying to generate UTF-8 QRCode so that I can encore accents and Unicode characters. To test it, I am using many decoding solution : http://zxing.org/w/decode.jspx - The zxing project also used in…

unicode encoding character-encoding decoding qr-code

asked Oct 23 '09 at 08:23

Natim

15,199
21
80
140

votes

2 answers

How can one find the Unicode codepoints that a font has glyphs for, on a Debian-based system?

From a scripting language (Python or Ruby, say) on a Debian-based system, I would like to find either one of: All the Unicode codepoints that a particular font has glyphs for All the fonts that have glyphs for a particular Unicode…

python ruby unicode fonts fontconfig

asked Apr 09 '13 at 08:03

Mark Longair

385,867
66
394
320

votes

1 answer

Issue about 65533 � in C# text file reading

I created a sample app to load all special characters while copy pasting from Openoffice writer to Notepad. Double codes differs and when I try to load this. var lines = File.ReadAllLines("..\\ter34.txt"); This creates problem of 65533 Issue comes…

c# unicode

asked Feb 22 '13 at 10:42

Aravind Srinivas

votes

3 answers

unicode text file output differs between XE2 and Delphi 2009?

When I try the code below there seem to be different output in XE2 compared to D2009. procedure TForm1.Button1Click(Sender: TObject); var Outfile:textfile; myByte: Byte; begin assignfile(Outfile,'test_chinese.txt'); Rewrite(Outfile); for…

delphi unicode utf-8

asked Jan 09 '13 at 10:16

Thomas

votes

4 answers

How to convert unicode accented characters to pure ascii without accents?

I'm trying to download some content from a dictionary site like http://dictionary.reference.com/browse/apple?s=t The problem I'm having is that the original paragraph has all those squiggly lines, and reverse letters, and such, so when I read the…

python unicode wget unicode-normalization

asked Jan 02 '13 at 07:28

Wolf

votes

2 answers

Japanese COBOL Code: rules for G literals and identifiers?

We are processing IBMEnterprise Japanese COBOL source code. The rules that describe exactly what is allowed in G type literals, and what are allowed for identifiers are unclear. The IBM manual indicates that a G'....' literal must have a SHIFT-OUT…

unicode cobol literals

asked Sep 09 '09 at 05:08

Ira Baxter

88,629
18
158
311

votes

4 answers

read/write unicode data in MySql

I am using MySql DB and want to be able to read & write unicode data values. For example, French/Greek/Hebrew values. My client program is C# (.NET framework 3.5). How do i configure my DB to allow unicode? and how do I use C# to read/write values…

c# sql mysql unicode

asked Sep 06 '09 at 15:59

user123093

2,147
3
16
16

votes

1 answer

UnicodeEncodeError: 'ascii' codec can't encode characters

I have a dict that's feed with url response. Like: >>> d { 0: {'data': u'

found "\u62c9\u67cf \u591a\u516c \u56ed"

'} 1: {'data': u'

some other data

'} ... } While using xml.etree.ElementTree function on this data values (d[0]['data']) I…

python unicode elementtree

asked Nov 21 '12 at 12:38

theta

21,223
35
106
149

votes

3 answers

Python 3: Demystifying encode and decode methods

Let's say I have a string in Python: >>> s = 'python' >>> len(s) 6 Now I encode this string like this: >>> b = s.encode('utf-8') >>> b16 = s.encode('utf-16') >>> b32 = s.encode('utf-32') What I get from above operations is a bytes array -- that…

python unicode encoding python-3.x

asked Nov 20 '12 at 08:57

treecoder

36,160
18
57
89

votes

3 answers

UNICODE, UTF-8 and Windows mess

I'm trying to implement text support in Windows with the intention of also moving to a Linux platform later on. It would be ideal to support international languages in a uniform way but that doesn't seem to be easily accomplished when considering…

c++ c windows unicode utf-8

asked Oct 26 '12 at 15:48

Murrgon

votes

5 answers

Python3 convert Unicode String to int representation

As we all know, a computer works with numbers. I'm typing this text right now, the server makes a number out of it and when you want to read it, you'll get text from the server. How can I do this on my own? I want to encrypt something with my own…

python string unicode python-3.x int

asked Sep 27 '12 at 16:09

user1703918

votes

1 answer

In haskell how can I uppercase a unicode character with respect to current locale

It turns out that uppercasing a character is a complicated business. If you get out of the basic ASCII character set, the rules for uppercasing a character and lowercasing a character are actually dependent on the locale in which the application is…

haskell unicode

asked Sep 21 '12 at 19:58

Savanni D'Gerinel

2,209
14
25

votes

5 answers

Shouldn't JSON.stringify escape Unicode characters?

I have a simple test page in UTF-8 where text with letters in multiple different languages gets stringified to JSON: http://jsfiddle.net/Mhgy5/ HTML:

検索 • Busca • Sök • 搜尋 • Tìm kiếm • Пошук • Cerca • Søk • Haku • Hledání •…</div>
        <div class="grid ai-start jc-space-between fw-wrap">
            <div class="grid gs4 fw-wrap tags ">
                
                <a href="../../questions/tagged/javascript" class="post-tag grid--cell" title="show questions tagged 'javascript'" rel="tag">javascript</a> 
                
                <a href="../../questions/tagged/json" class="post-tag grid--cell" title="show questions tagged 'json'" rel="tag">json</a> 
                
                <a href="../../questions/tagged/unicode" class="post-tag grid--cell" title="show questions tagged 'unicode'" rel="tag">unicode</a> 
                
            </div>
            <div class="started mt0">
                
                    
<div class="s-user-card s-user-card">
    <time class="s-user-card--time" datetime="asked Sep 04 '12 at 21:23">asked Sep 04 '12 at 21:23</time>
    <a href="../../users/23501/ates-goral" class="s-avatar s-avatar__32 s-user-card--avatar">
        <img class="s-avatar--image" src="../../users/profiles/23501.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Ates Goral" onerror="onImageLoadingError(this);" />
    </a>
    <div class="s-user-card--info">
        <a href="../../users/23501/ates-goral" class="s-user-card--link">Ates Goral</a>
        <ul class="s-user-card--awards">
            <li class="s-user-card--rep" title="reputation score">126,894</li>
            <li class="s-award-bling s-award-bling__gold" title="24 gold badges">24</li>
            <li class="s-award-bling s-award-bling__silver" title="129 silver badges">129</li>
            <li class="s-award-bling s-award-bling__bronze" title="188 bronze badges">188</li>
        </ul>
    </div>
</div>

</div>
        </div>
    </div>
</div>
                    </div>
                
                    <div class="mln24">
                    <div class="question-summary" id="question-summary-11968183">
    <div class="statscontainer">
        <div class="stats">
            <div class="vote">
                <div class="votes">
                    <span class="vote-count-post"><strong>11</strong></span>
                    <div class="viewcount">votes</div>
                </div>
            </div>

<div class="status ">
                <strong>2</strong> answers
            </div>
        </div>
    </div>
    <div class="summary">
        
        <h3><a href="../../questions/11968183/how-to-handle-accented-characters-in-file-names-in-git-on-mac-os-x-converted-t" class="question-hyperlink">How to handle accented characters in file names in Git on Mac OS X converted to unicode</a></h3>
        <div class="excerpt">In my Git repository, has accented files as éíóúàèìòùãõ_800x600.jpg, but after making clone, I can not do pull, because the file appears as modified:
$git clone [...]
done

$git status
# On branch master
# Untracked files:
#   (use "git add…</div>
        <div class="grid ai-start jc-space-between fw-wrap">
            <div class="grid gs4 fw-wrap tags ">
                
                <a href="../../questions/tagged/macos" class="post-tag grid--cell" title="show questions tagged 'macos'" rel="tag">macos</a> 
                
                <a href="../../questions/tagged/git" class="post-tag grid--cell" title="show questions tagged 'git'" rel="tag">git</a> 
                
                <a href="../../questions/tagged/unicode" class="post-tag grid--cell" title="show questions tagged 'unicode'" rel="tag">unicode</a> 
                
            </div>
            <div class="started mt0">
                
                    
<div class="s-user-card s-user-card">
    <time class="s-user-card--time" datetime="asked Aug 15 '12 at 11:03">asked Aug 15 '12 at 11:03</time>
    <a href="../../users/980377/shankar-cabus" class="s-avatar s-avatar__32 s-user-card--avatar">
        <img class="s-avatar--image" src="../../users/profiles/980377.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Shankar Cabus" onerror="onImageLoadingError(this);" />
    </a>
    <div class="s-user-card--info">
        <a href="../../users/980377/shankar-cabus" class="s-user-card--link">Shankar Cabus</a>
        <ul class="s-user-card--awards">
            <li class="s-user-card--rep" title="reputation score">7,824</li>
            <li class="s-award-bling s-award-bling__gold" title="6 gold badges">6</li>
            <li class="s-award-bling s-award-bling__silver" title="29 silver badges">29</li>
            <li class="s-award-bling s-award-bling__bronze" title="43 bronze badges">43</li>
        </ul>
    </div>
</div>

</div>
        </div>
    </div>
</div>
                    </div>

</div>
        </div>

</div>
        </div>
        <script src="../../static/js/stack-icons.js"></script>
        <script>
            /* replace <time class="fromnow" /> with human delta between `datetime` attr and now */
            document.addEventListener('DOMContentLoaded', function(){
                var time_elements = document.querySelectorAll("time.fromnow");
                for (var i=0; i<time_elements.length; i++) {
                    time_elements[i].innerHTML = moment(time_elements[i].getAttribute("datetime")).fromNow();
                }
            });
        </script>
        
    </body>
</html>