10

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.

Kromster
  • 6,665
  • 7
  • 55
  • 98
  • 2
    ANSI isn't really a proper character encoding name; Windows generally uses "ANSI" to mean Windows-1252. http://stackoverflow.com/questions/701882/ – Miles Apr 02 '09 at 21:12
  • 5
    @Miles: Windows use "ANSI" to means whatever your locale is. It would be SJIS for japanese windows user; GB2312 for S-Chinese windows user, etc... – J-16 SDiZ Jul 04 '09 at 05:29
  • 3
    Would you please explain what exactly happen, so you "haven't succeeded"? – AlexSC Jul 27 '15 at 12:13

6 Answers6

9

The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.

Sample:

procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
  Strings: TStrings;
begin
  Strings := TStringList.Create;
  try
    Strings.LoadFromFile(AInputFileName);
    Strings.Text := UTF8Encode(Strings.Text);
    Strings.SaveToFile(AOutputFileName);
  finally
    Strings.Free;
  end;
end;
Jens Mühlenhoff
  • 13,744
  • 6
  • 47
  • 101
mjn
  • 35,561
  • 24
  • 160
  • 351
  • 1
    The OP tagged the question as delphi-7. In Delphi 7, strings as ANSU by default, so the strings existing in `TStringList` are also ANSI. Are you sure this will work? – AlexSC Jul 27 '15 at 12:11
  • @AlexSC yes (I assume that the files have been created using the same default ANSI code page which is used by the Delphi program) – mjn Jul 27 '15 at 13:08
1

Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.

skamradt
  • 14,916
  • 2
  • 33
  • 49
0
var
  Latin1Encoding: TEncoding;
begin
  Latin1Encoding := TEncoding.GetEncoding(28591);
  try
       MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
  finally
      Latin1Encoding.Free;
  end;
end;
pedrofernandes
  • 14,655
  • 9
  • 32
  • 42
0

Please read the whole answer before you start coding.


The proper answer to question - and it is not the easy one - basically consist of tree steps:

  1. You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
  2. You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
  3. You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.

However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.


The proper solution to your problem is ways more complicated:

If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.

It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the @ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.

I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)

mg30rg
  • 1,301
  • 13
  • 22
  • The question is about converting file content from ANSI to UTF-8, the file names (in the memo field) are a different question iiuc – mjn Jul 27 '15 at 13:05
  • @mjn - No. In the question Yilmaz Ekici wrote about a file list in the memo _"This program lists the path and name of these files in a memo."_ , not about file content. Now (s)he might wanted to ask about file content conversion, but (s)he did not. – mg30rg Jul 27 '15 at 13:48
  • 1) the question title begins with `How can a text file be converted ...` 2) after mentioning the file list, the question continues with `I need convert these files`. – mjn Jul 28 '15 at 05:55
0

I did only this:

procedure TForm1.FormCreate(Sender: TObject);
begin
  Strings := TStringList.Create;
end;  

procedure TForm1.Button3Click(Sender: TObject);
begin
   Strings.Text := UTF8Encode(Memo1.Text);
   Strings.SaveToFile('new.txt');
end;

Verified with Notepad++ UTF8 without BOM

-1

Did you mean ASCII?

ASCII is backwards compatible with UTF-8. http://en.wikipedia.org/wiki/UTF-8

jason saldo
  • 9,308
  • 5
  • 31
  • 41
  • No, I mean ANSI. Open a txt file.(notepad) File----> save as -------> encoding ------> ANSI or UTF-8 or... ----> SAVE I hope, this helps to see my aim... –  Apr 02 '09 at 19:15