234

What is ANSI encoding format? Is it a system default format? In what way does it differ from ASCII?

Ripon Al Wasim
  • 34,088
  • 37
  • 146
  • 165
web dunia
  • 8,634
  • 17
  • 47
  • 63

10 Answers10

274

ANSI encoding is a slightly generic term used to refer to the standard code page on a system, usually Windows. It is more properly referred to as Windows-1252 on Western/U.S. systems. (It can represent certain other Windows code pages on other systems.) This is essentially an extension of the ASCII character set in that it includes all the ASCII characters with an additional 128 character codes. This difference is due to the fact that "ANSI" encoding is 8-bit rather than 7-bit as ASCII is (ASCII is almost always encoded nowadays as 8-bit bytes with the MSB set to 0). See the article for an explanation of why this encoding is usually referred to as ANSI.

The name "ANSI" is a misnomer, since it doesn't correspond to any actual ANSI standard, but the name has stuck. ANSI is not the same as UTF-8.

Noldorin
  • 134,265
  • 53
  • 250
  • 293
  • 2
    I know ANSI as being Code Page 437, not Windows Code Page 1252. Back when ANSI referred to the graphics created for bulletin board systems, I can pretty much guarantee that is the case. – Doug Moore Aug 21 '13 at 22:51
  • @lordscarlet: ANSI hasn't standardised them, but Windows-1252 is the closest you get (at least on Windows), as its a superset. See http://en.wikipedia.org/wiki/ANSI_codepage#ANSI for reference. – Noldorin Aug 23 '13 at 12:11
  • 4
    "ANSI" does clearly not refer to any ANSI standard, however it is a matter of fact that you can chose "Encoding: ANSI" for example in Notepad when you save a file. And the actual question is: "What does it mean"? This answer is by far the best one. – Wernfried Domscheit Mar 15 '18 at 10:23
  • 1
    In my case, ANSI was referring to `windows-1254`. – Ramazan Polat Oct 11 '19 at 17:46
  • 1
    The discussion here in the comments as to what it "actually" means is an excellent illustration of why this non-term is problematic; *it isn't well-defined.* – tripleee Aug 16 '20 at 08:41
  • Why do you say 8bit bytes? Byte is by definition 8 bits. – David Klempfner Feb 19 '21 at 00:53
  • 1
    Yes, though 'only' in the modern conventional definition, and even then there is occasionally some flexibility in contexts such as this one. See e.g. Wikipedia: "Historically, the byte was the number of bits used to encode a single character of text in a computer". – Noldorin Feb 19 '21 at 01:25
62

Technically, ANSI should be the same as US-ASCII. It refers to the ANSI X3.4 standard, which is simply the ANSI organisation's ratified version of ASCII. Use of the top-bit-set characters is not defined in ASCII/ANSI as it is a 7-bit character set.

However years of misuse of the term by the DOS and subsequently Windows community has left its practical meaning as “the system codepage of whatever machine is being used”. The system codepage is also sometimes known as ‘mbcs’, since on East Asian systems that can be a multiple-byte-per-character encoding. Some code pages can even use top-bit-clear bytes as trailing bytes in a multibyte sequence, so it's not even strict compatible with plain ASCII... but even then, it's still called “ANSI”.

On US and Western European default settings, “ANSI” maps to Windows code page 1252. This is not the same as ISO-8859-1 (although it is quite similar). On other machines it could be anything else at all. This makes “ANSI” utterly useless as an external encoding identifier.

bobince
  • 498,320
  • 101
  • 621
  • 807
38

Strictly speaking, there is no such thing as ANSI encoding. Colloquially the term ANSI is used for several different encodings:

  1. ISO 8859-1
  2. Windows CP1252
  3. Current system encoding on a Windows machine (in Win32 API terminology).
Wernfried Domscheit
  • 38,841
  • 5
  • 50
  • 81
Nemanja Trifunovic
  • 23,597
  • 3
  • 46
  • 84
  • That is wrong. The Windows codepage 1252 was created based on ISO 8859-1 but is not completely equal. The term ANSI references to the ISO 8859-x standard. – Patrik Jan 10 '20 at 11:58
  • 1
    @Patrik No, it doesn't. There are situations where that interpretation is actually correct, but as this and several other answers here vividly illustrate, you can't really tell without additional context. – tripleee Aug 16 '20 at 08:45
21

Once upon a time Microsoft, like everyone else, used 7-bit character sets, and they invented their own when it suited them, though they kept ASCII as a core subset. Then they realised the world had moved on to 8-bit encodings and that there were international standards around, such as the ISO-8859 family. In those days, if you wanted to get hold of an international standard and you lived in the US, you bought it from the American National Standards Institute, ANSI, who republished international standards with their own branding and numbers (that's because the US government wants conformance to American standards, not international standards). So Microsoft's copy of ISO-8859 said "ANSI" on the cover. And because Microsoft weren't very used to standards in those days, they didn't realise that ANSI published lots of other standards as well. So they referred to the standards in the ISO-8859 family (and the variants that they invented, because they didn't really understand standards in those days) by the name on the cover, "ANSI", and it found its way into Microsoft user documentation and hence into the user community. That was about 30 years ago, but you still sometimes hear the name today.

Michael Kay
  • 138,236
  • 10
  • 76
  • 143
  • standards were industry stuff so programmers were new to standards since it was a new industry? – CoffeDeveloper Mar 03 '15 at 14:44
  • 1
    It wasn't a new industry by the time Microsoft was founded. – Michael Kay Mar 03 '15 at 19:57
  • Microsoft has a problematic and controversial attitude towards interoperability in general. When they decided in the late 1990s to "embrace and extend" standards instead of directly shun them, that was a remarkable change, though still not a responsible approach towards proper interoperability. (You *could* argue that progress is impossible if you only adhere to existing standards, but that's obviously not the primary reason they do it this way.) – tripleee Jun 01 '18 at 06:26
16

ASCII just defines a 7 bit code page with 128 symbols. ANSI extends this to 8 bit and there are several different code pages for the symbols 128 to 255.

The naming ANSI is not correct because it is actually the ISO/IEC 8859 norm that defines this code pages. See ISO/IEC 8859 for reference. There are 16 code pages ISO/IEC 8859-1 to ISO/IEC 8859-16.

Windows-1252 is again based on ISO/IEC 8859-1 with some modification mainly in the range of the C1 control set in the range 128 to 159. Wikipedia states that Windows-1252 is also refered as ISO-8859-1 with a second hyphen between ISO and 8859. (Unbelievable! Who does something like that?!?)

Daniel Brückner
  • 56,191
  • 15
  • 92
  • 137
5

Basically "ANSI" refers to the legacy codepage on Windows. See also an article by Raymond Chen on this topic:

The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1.

The first 127 characters are identical to ASCII in most code pages, the upper characters vary, though.

However, ANSI does not automatically mean CP1252 or Latin 1.

All confusion notwithstanding you should simply avoid such issues nowadays and use Unicode.

Joey
  • 316,376
  • 76
  • 642
  • 652
4

Just in case your PC is not a "Western" PC and you don't know which code page is used, you can have a look at this page: National Language Support (NLS) API Reference

[Microsoft removed this reference, take it form web-archive National Language Support (NLS) API Reference

Or you can query your registry:

C:\>reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /f ACP

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
    ACP    REG_SZ    1252

End of search: 1 match(es) found.

C:\>
Wernfried Domscheit
  • 38,841
  • 5
  • 50
  • 81
2

When using single-byte characters, the ASCII format defines the first 127 characters. The extended characters from 128-255 are defined by various ANSI code pages to allow limited support for other languages. In order to make sense of an ANSI encoded string, you need to know which code page it uses.

Eric Petroelje
  • 57,359
  • 8
  • 118
  • 174
2

I remember when "ANSI" text referred to the pseudo VT-100 escape codes usable in DOS through the ANSI.SYS driver to alter the flow of streaming text.... Probably not what you are referring to but if it is see http://en.wikipedia.org/wiki/ANSI_escape_code

jmucchiello
  • 17,551
  • 5
  • 37
  • 59
-4

ANSI (aka Windows-1252/WinLatin1) is a character encoding of the Latin alphabet, fairly similar to ISO-8859-1. You may want to take a look of it at Wikipedia.

moff
  • 6,177
  • 29
  • 30