0

I'm trying to do something basic, read a UTF-8 encoded text file and display it to the console. Everytime I run the script my output is the following:

enter image description here

The file i'm trying to read is here: https://www.gutenberg.org/cache/epub/49724/pg49724.txt

I have no idea why I'm getting this output. I'm sure its something incredibly stupid that I'm overlooking but I've dumbed my code down to the following to try and identify the problem.

static void Main(string[] args)
        {
        DateTime end;
        DateTime start = DateTime.Now;

        Console.WriteLine("### Overall Start Time: " + start.ToLongTimeString());
        Console.WriteLine();

        ReadFile();

        end = DateTime.Now;
        Console.WriteLine();
        Console.WriteLine("### Overall End Time: " + end.ToLongTimeString());
        Console.WriteLine("### Overall Run Time: " + (end - start));

        Console.WriteLine();
        Console.WriteLine("Hit Enter to Exit");
        Console.ReadLine();
    }

    static void ReadFile() {
        string fileName = "snow-white.txt";

        try
        {
            foreach (string line in File.ReadLines(fileName, Encoding.UTF8))
            {
                Console.WriteLine("-- {0}", line);
            }
        }
        catch (Exception Ex)
        {
            Console.WriteLine(Ex.ToString());
        }
    }

Any help would be greatly appreciated.

Thanks,

damaniel
  • 219
  • 1
  • 3
  • 9
  • 2
    What do you see when you open the file with notepad? – EZI Aug 18 '15 at 19:12
  • Its just plaintext. Here is the file I'm trying to read: https://www.gutenberg.org/cache/epub/49724/pg49724.txt – damaniel Aug 18 '15 at 19:12
  • https://www.gutenberg.org/cache/epub/49724/pg49724.txt – damaniel Aug 18 '15 at 19:13
  • It is not your locafile. Do you see the same when you open the local file? – EZI Aug 18 '15 at 19:17
  • Yes. I've copied and pasted the text and added it into a text file that's in my project. – damaniel Aug 18 '15 at 19:18
  • If you pasted the content it may no longer be UTF-8 encoding – Jasen Aug 18 '15 at 19:23
  • Not sure how to verify that. It's just plaintext when opened as a textfile in visual studio, notepad and any other text file viewer. – damaniel Aug 18 '15 at 19:25
  • Same output as screenshot. – damaniel Aug 18 '15 at 19:28
  • In visual studio: **File > Open File, open with... binary editor**. Then look at the byte order and compare to [this answer](http://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom). – Jasen Aug 18 '15 at 19:29
  • I see 5 empty Guids in every line. I don't think this formatting is random. I am not sure about you read the same file in code. What is the content of `filename`? – EZI Aug 18 '15 at 19:32
  • Also, if you paste in Notepad you can choose the encoding when you save as... then there won't be any question. – Jasen Aug 18 '15 at 19:33
  • @Jasen - first bytes are EF BB BF – damaniel Aug 18 '15 at 19:34
  • Have you tried other files? Writing something in a text editor yourself? – 31eee384 Aug 18 '15 at 19:40
  • So I resaved file as UTF-8 encoding in Sublime Text editor and still getting same issue. Also verified that the first bytes changed and no longer are EF BB BF. Now I'm back to square 1. Is anyone else able to reproduce my issue? – damaniel Aug 18 '15 at 19:42
  • @damaniel I don't think you use the correct file in your code. See this code. It works **`new WebClient().DownloadFile("https://www.gutenberg.org/cache/epub/49724/pg49724.txt", "snow-white.txt"); foreach (string line in File.ReadLines("snow-white.txt", Encoding.UTF8)) { Console.WriteLine("-- {0}", line); }`** – EZI Aug 18 '15 at 19:48
  • All I did was copy and paste into a textfile. I just wrote the same script in python and it works fine. – damaniel Aug 18 '15 at 19:50
  • 1
    @damaniel Am I not clear? Which text file? your code reads the file in your bin/debug directory. Do you paste it there? – EZI Aug 18 '15 at 19:52
  • 1
    Yes its the textfile in my bin/debug directory. You're clear but I want to know why I'm getting this problem in .NET specifically c#. I can do this 1000 different ways but that doesn't help me understand why I'm getting 0's when I do it using the above code. – damaniel Aug 18 '15 at 19:53
  • Ok, EZI. you pointed me in the right direction. For some reason my file was being published to my bin\debug directory containing the 0's that look like GUID's, so the script was working fine. The data that was being published in the file was being converted to all 0's and I'm guessing because the encoding was all out of whack from the copy and paste. (Hand on face). Thanks for all the help everyone. – damaniel Aug 18 '15 at 19:58

1 Answers1

1

As EZI pointed out, check the contents of the file that gets published to the bin\debug directory and verify that the file was published in the right format.

My problem was that the file's contents was NOT being published correctly. I needed to make sure the file looks the same by going straight to the source.

not my finest moment : )

damaniel
  • 219
  • 1
  • 3
  • 9