1

I want to know the location of 'BOM' character in the file and how can i view this character. It will be helpful if somebody explain what is BOM character. I want to do is that i am having ANSI file and i want to convert it into UTF-8 encoding \ how can i do that?

Thanks in advance

1 Answers1

4

" I want to know the location of 'BOM' character in the file

the BOM is at the beginning of the file.

why didn't you google it or look it up in wikipedia.

" how can i view this character

ordinarily you can't, but in some situations it's displayed.

" It will be helpful if somebody explain what is BOM character

BOM was originally a byte order mark, used to make it easy to determine the endianness of UTF-16 or UTF-32 encoded text. in Windows it's used also to identify UTF-8 encoded files as such, and in particular the visual c++ compiler will misidentify the encoding if there is no BOM. the wikipedia article about BOM is unfortunately skewed towards a Unix-land fan boys point of view where UTF-8 files should be incompatible1 with common requirement in Windows (it helps to consider that Microsoft was a founding member of the Unicode consortium, thus there's nothing in the Unicode standard that's contrary to the convention in Windows).

" I want to do is that i am having ANSI file and i want to convert it into UTF-8 encoding \ how can i do that

to convert accurately you need to know the exact encoding used for the file. note that "windows ansi" is a set of possible encodings, where the windows ansi on a given Windows installation is the codepage reported by the GetACP API functions. given the knowledge of the encoding you can use either the Windows API's MultiByteToWideChar, or the C library's mbcstowcs, or the C++11 C++ library's codecvt.


1) of old the g++ compiler choked on BOM in UTF-8 source code, the opposite of visual c++ which requires a BOM. happily modern version of g++ accepts the BOM. as it is required to do by the standards.

Cheers and hth. - Alf
  • 135,616
  • 15
  • 192
  • 304
  • Thanks it is very helpful and also i like to know why UTF-8,16,32 –  Jun 04 '14 at 12:52
  • By appending BOM character at the beginning of ANSI file will it work? –  Jun 04 '14 at 12:59
  • 1
    @Dipak, ANSI text is text in some particular code page. BOMs don't exist for code pages, only for UNICODE encodings: UTF-8, UTF-16, UTF-32. ANSI files cannot have BOM, and therefore you have to know the exact code page by some other means. – Dialecticus Jun 04 '14 at 13:02
  • BOM is a feature of the UTF encodings. Windows ANSI encodings do not support a BOM. I'd use the `MultiByteToWideChar` API function because it allows you to detect the required buffer size. – Cheers and hth. - Alf Jun 04 '14 at 13:02