1

I want/need to edit files with UTF-8 characters in it and I want to use VIM for it. Before I get accused of asking something that is asked before, I've read the VIM documentation on encoding, fileencoding[s], termencoding and more, googled the subject and read this question among other texts.

Here is a sentence with a UTF-8 character in it that I use as test case.

From Japanese 勝 (katsu) meaning "victory"

If I open the (UTF-8) file with notepad it is displayed correct. When I open it with vim, the best thing I get is a black square where the Japanese character for katsu should be. Changing any of the settings for fileencoding or encoding does not make a difference.

Why is vim giving me a black square where notepad is displaying it whithout problems? If I copy the text from vim with copy/paste to notepad it is displayed correct. Indicating that the text is not corrupted but displayed wrong. But what setting(s) have influence on that?

Any suggestions is welcome

Here is the relevant part of my _vimrc.

if has("multi_byte")
  set encoding=utf-8
  if &termencoding == ""
    let &termencoding = &encoding
  endif
  setglobal fileencoding=utf-8
  set fileencodings=ucs-bom,utf-8,latin1
endif

The actual settings when I open the file are:

encoding=utf-8
fileencoding=utf-8
termencoding=utf-8

My PC is running Windows 10, language is English(United States).

==== Edit 4 oct 2016. ====

This is what the content of the file looks like after loading it in vim and converting it to hex.

0000000: efbb bf46 726f 6d20 4a61 7061 6e65 7365  ...From Japanese
0000010: 20e5 8b9d 2028 6b61 7473 7529 206d 6561   ... (katsu) mea
0000020: 6e69 6e67 2022 7669 6374 6f72 7922 0d0a  ning "victory"..

The first to bytes is the microsoft BOM magic, the rest is just like ascii except for the second, third and fourth byte on the second line, which must represent the non-ascii character somehow.

Community
  • 1
  • 1
PapaAtHome
  • 500
  • 5
  • 16
  • what about the font that vim's being displayed in? just because vim knows what char it is doesn't mean that the terminal window containing vim does. – Marc B Oct 03 '16 at 21:41
  • @Marc B: In vim the fontsettings I recognise are 'guifont=Fixedsys:h9:cDEFAULT', 'printfont=Courier_New:h10' and, as a matter of facts, I have not searched for this kind of settings. – PapaAtHome Oct 03 '16 at 21:47
  • there you go. `fixedsys` is basically an ASCII-only font. fire up `charmap` and you'll see that fixedsys has essentially NOTHING in it compared to what a full unicode font like courier new does. – Marc B Oct 03 '16 at 21:49
  • Any suggestion on what to use? I tried 'Courier_New' and the black box has changed in a open box with a black border. (In other words, no change). At least it gives me the idea to look for other fonts. – PapaAtHome Oct 03 '16 at 21:53
  • dunno what font notepad uses in win10, but that'd be a good starting point. – Marc B Oct 03 '16 at 21:55
  • I tried several fonts. My notepad is configured to use Consolas but in vim is makes no difference. I assume that the utf-8 part with all the fancy characters is a bit more complicated than what charmap can show me. Oh, by the way, notepad works perfectly well with 'Courier New' – PapaAtHome Oct 03 '16 at 22:08
  • Are you sure the file is in UTF-8 and not something else (e.g. UTF-16)? What do the byte values look like (use `g8` to check). – Martin Tournoij Oct 04 '16 at 11:48

1 Answers1

2

There's two steps to making Vim successfully display a UTF-8 character:

  1. File Encoding. You've correctly identified that this is controlled by the 'encoding' and 'fileencodings' options. Once you've properly set this up (which you can verify via :setlocal filenencoding?, or the ga command on a known character, or at least by checking that each character is represented by a single cell, not its constituent byte values), there's:
  2. Character Display. That is, you need to use a font that contains the UTF-8 glyphs. UTF-8 is large; most fonts don't contain all glyphs. In my experience, that's less of a problem on Linux, which seems to have some automatic fallbacks built-in. But on Windows, you need to have a proper font installed and configured (GVIM: in guifont).

For example, to properly display Japanese Kanji characters, you need to install the far eastern language support in Windows, and then

:set guifont=MS_Gothic:h12:cSHIFTJIS
Ingo Karkat
  • 154,018
  • 15
  • 205
  • 275
  • Loading the Japanese language and using the guifont as given did the job. Now I'm stuck with just one more question. Why is notepad able to display the glyphs without loading the language? – PapaAtHome Oct 04 '16 at 17:39
  • So you probably already had some font that has Japanese glyphs, and Notepad has some fallback-logic built-in that found and used those fonts. In Vim (on Windows), this requires manual configuration, though. – Ingo Karkat Oct 05 '16 at 06:57