11

I am implementing a shell-like program in C++. It has a loop that reads from cin, forks, and waits for the child.

This works fine if the input is interactive or if it's piped from another program. However, when the input is a bash heredoc, the program rereads parts of the input (sometimes indefinitely).

I understand that the child process inherits the parent's file descriptors, including shared file offset. However, the child in this example does not read anything from cin, so I think it shouldn't touch the offset. I'm kind of stumped about why this is happening.


test.cpp:

#include <iostream>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
    std::string line;
    while (std::getline(std::cin, line)) {
        pid_t pid = fork();
        if (pid == 0) { // child
            break; // exit immediately
        }
        else if (pid > 0) { // parent
            waitpid(pid, nullptr, 0);
        }
        else { // error
            perror("fork");
        }

        std::cout << getpid() << ": " << line << "\n";
    }
    return 0;
}

I compile it as follows:

g++ test.cpp -std=c++11

Then I run it with:

./a.out <<EOF
hello world
goodbye world
EOF

Output:

7754: hello world
7754: goodbye world
7754: goodbye world

If I add a third line foo bar to the input command, the program gets stuck in an infinite loop:

13080: hello world
13080: goodbye world
13080: foo bar
13080: o world
13080: goodbye world
13080: foo bar
13080: o world
[...]

Versions:

  • Linux kernel: 4.4.0-51-generic
  • Ubuntu: 16.04.1 LTS (xenial)
  • bash: GNU bash, version 4.3.46(1)-release (x86_64-pc-linux-gnu)
  • gcc: g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Kevin Chen
  • 964
  • 8
  • 24
  • 1
    What happens if you do `std::ios::sync_with_stdio(false);` at the beginning and explicitly flush after your write to stdout? (e.g. change `'\n'` to `std::endl`) –  Dec 05 '16 at 01:09
  • 3
    By stracing the child process, I can see exactly what's going on. The child process lseeks backwards on file descriptor 0 before exiting, which affects the parent process. Unfortunately, I don't know why the C library is doing that, so I'm not going to post the details as an answer. This also happens with an explicit `exit(0);` but not `_exit(0);`. – Sam Varshavchik Dec 05 '16 at 01:10
  • 1
    the answer here might be useful http://stackoverflow.com/questions/33899548/file-pointers-after-returning-from-a-forked-child-process – Erix Dec 05 '16 at 01:15
  • @Hurkyl yes, that fixes it! I guess because C library is seeking stdout (per @SamVarshavchik), turning off synchronization changes the child process's effect. – Kevin Chen Dec 05 '16 at 01:19
  • @Erix Thanks, that's really helpful. So maybe the "right" way to fix this is calling `fclose(stdin)` etc at the beginning of the child, to prevent exit from seeking in the first place. – Kevin Chen Dec 05 '16 at 01:20
  • @Kevin: Interesting! I wonder if the problem, then, is in the C library, or the interaction between the C and C++ libraries? What if you close `cin` after forking (I don't remember if you can do that; if not, you could set its badbit or something)? What if you rewrite the program to use the C io routines? –  Dec 05 '16 at 01:21
  • ... also, it's probably worth testing with a larger heredoc size; enough to overflow up a whole `cin` buffer, maybe twice. –  Dec 05 '16 at 01:24
  • 1
    I couldn't reproduce this, but you absolutely should be `_exit`'ing or `quick_exit`'ting from the forked child if the child doesn't `exec`. The parent builds up `cout` buffer state and the children inherit it. If the children exit regularly, they will attempt to flush their copy of the `cout` buffer which should be getting flushed in the parent. If this happens, you will get duplicates in your output. – PSkocik Dec 05 '16 at 01:31
  • @PSkocik you just should flush buffers before a `fork` if this is a concern. – n. 'pronouns' m. Dec 05 '16 at 01:40
  • 1
    @SamVarshavchik AFAIU failure to rewind stdin was considered a glibc bug back when. – n. 'pronouns' m. Dec 05 '16 at 01:58
  • @Hurkyl the behavior persists when I add `std::cin.setstate(std::ios::failbit);` to the beginning of the child. It also happens when I rewrite the program to use getline and printf (compiled with both c++11 and c11). And the problem goes away when I make the child sleep instead of calling exit. So it seems like exit is the culprit here. – Kevin Chen Dec 05 '16 at 02:22
  • don't use nullptr for C lib. nullptr is C++, use NULL when you use C function. – Stargateur Dec 05 '16 at 03:38
  • @KevinChen this has nothing to do with C++ streams. You need to call `close(0)` before exiting. – n. 'pronouns' m. Dec 05 '16 at 11:57

1 Answers1

2

I was able to reproduce this problem, not only using the heredoc but also using a standard file redirection.

Here is the test script that I used. In both the first and second cases, I got a duplication of the second line of input.

./a.out < Input.txt
echo

cat Input.txt | ./a.out
echo

./a.out <<EOF
hello world
goodbye world
EOF

Closing the stdin before exiting in the child seems to eliminate both of the problems.

#include <iostream>
#include <sstream>
#include <unistd.h>
#include <sys/wait.h>
#include <limits>

int main(int argc, char **argv)
{
    std::string line;
    while (std::getline(std::cin, line)) {
        pid_t pid = fork();
        if (pid == 0) { // child
            close(STDIN_FILENO);
            break; // exit after first closing stdin
        }
        else if (pid > 0) { // parent
            waitpid(pid, nullptr, 0);
        }
        else { // error
            perror("fork");
        }

        std::cout << getpid() << ": " << line << "\n";
    }
    return 0;
}
merlin2011
  • 63,368
  • 37
  • 161
  • 279