2

I wrote up a piece of trivial code to output sum total of ascii value, in both c++ and python, given an input file.

The c++ piece seems to have excluded '\n' while the python piece did include '\n' as part of its calculation, for the same input text file.

I was wondering if there is any step in my code that I overlooked.

code pieces:

    import sys

    try:
         f=open(sys.argv[1]).read()
    except:
        print " file not found \n"  
        sys.exit()
    sum=0
    for line in f:
       for character in line:
          try:
        if character=='\n':
            pass
        else:
            print character
            sum+=ord(character)

    except:
        print "failed \n"
        pass


     print "The sum is %d \n" %sum

ANd the c++ piece is :

    #include "iostream"
    #include "fstream"
    #include "string"
    int k;
    int main(int argc, char *argv[])
    {
    int sum=0;
    std::string line;
    std::ifstream myfile (argv[1]);
    if (myfile.is_open())
      {while (myfile.good())
        {
        getline (myfile,line);
        for (k=0;k<(line.length());k++)
            {
                sum=sum+int(line[k]);
            }
    }
std::cout<<" The total sum is : " <<sum<<std::endl;
}
  else std::cout<< "Unable to open file ";
  return 0;
  }
metric-space
  • 531
  • 4
  • 14
  • 8
    They are different languages with different specification and different libraries, why would you expect them to work the same? – Some programmer dude Apr 17 '13 at 18:12
  • 1
    Your C++ code will count the last line of a text file twice. – Joseph Mansfield Apr 17 '13 at 18:14
  • 2
    *Aside*: Never use `.eof()` or `.good()` as a loop construct. It almost always produces buggy code, as it does in your example. – Robᵩ Apr 17 '13 at 18:15
  • @Rob: Why? Can you give some reasons? – aldeb Apr 17 '13 at 18:17
  • 1
    @segfolt The reason is my comment. `good` doesn't give a very good indication of whether the next extraction will succeed. In fact, in this case, the last line will be read and the final `\n` extracted but no error bits will be set. The next iteration will then attempt to extract another line, but there isn't one. – Joseph Mansfield Apr 17 '13 at 18:18
  • `.eof()` is set only _after_ there was an unsuccessful read at the end of file, not _before_ – unkulunkulu Apr 17 '13 at 18:23
  • @stfrabbit: So what to use instead? – aldeb Apr 17 '13 at 18:23
  • I think you'll find that in the Python code, `line` is not what you expect. Not that it makes any difference for the purposes of your question. – Mark Ransom Apr 17 '13 at 18:28
  • since the answer is supposedly due to the exclusion of the '\n' character because c++ is designed that way...so from where does the buggy part due to .good() come from? – metric-space Apr 17 '13 at 18:28
  • @JoachimPileborg Agreed, I should not expect both languages to behave the same, but then do tell why should I not expect them to behave the same? – metric-space Apr 17 '13 at 18:33
  • 1
    @segfolt The idiomatic way is `while (std::getline(myfile, line)) ...` – Angew is no longer proud of SO Apr 17 '13 at 18:37
  • @nerorevenge - The `.good()` problem is not directly related to your question, it is an aside. For more info see http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong – Robᵩ Apr 17 '13 at 18:53
  • If you would want to compare the taste of a pear to the taste of an orange, you would not expect them to be the same. It's the same in the programming language world. Two different language behave differently just like different fruits taste differently. – Some programmer dude Apr 17 '13 at 18:56
  • Analogies don't always apply and isn't it a bit of irony at play? comapring comparison of tastes of fruit to comparison of language implementations? NO reason to believe they should be the same right? – metric-space Apr 17 '13 at 19:03
  • 1
    @nerorevenge If you had used `myfile.read()` instead of `std::getline()`, you'd have gotten the newlines in C++ as well. – Angew is no longer proud of SO Apr 17 '13 at 19:37

3 Answers3

4

By default, ifstream::getline uses '\n' as the delimiter for end of line. It discards the delimiter so what you are seeing is what you would expect.

Ashwini Chaudhary
  • 217,951
  • 48
  • 415
  • 461
parkydr
  • 7,027
  • 2
  • 29
  • 39
4

As per the specification of std::getline() [string.io]§7:

extracts characters from is and appends them to str ... until any of the following occurs:

  • ...
  • traits::eq(c, delim) for the next available input character c (in which case, c is extracted but not appended)

(Emphasis mine).

This means that when getline encounters the delimiter (\n by default), it removes it from the stream, but does not store it in the string.

And as to directly answer your "why" question: because it's designed that way.

Angew is no longer proud of SO
  • 156,801
  • 13
  • 318
  • 412
1

Your Python code isn't doing what you think it's doing. You're reading the entire file into the string f, then iterating through it, producing single character strings instead of lines.

However even if you did it the proper way and used readlines or iterated the file object directly, you'd still have the same problem with trailing newlines being retained. The reason comes from this statement in the Input and Output Tutorial:

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

C++ uses a different method of signalling the end of file, so it doesn't need to retain this distinction and is free to lose the trailing newline. This is usually more convenient.

Mark Ransom
  • 271,357
  • 39
  • 345
  • 578