2

This fascinating post:

How is this command legal ? “> file1 < file2 cat”

highlights surprising behavior in a seemingly malformed cat call to "the shell" (a Linux shell, presumably BASH). Basically, shells seem to be able to grab executables from ambiguous positions within in a series of strings and then use I/O redirection to streams / file descriptors.

The basic process per my understanding is:

  1. Look for redirection patterns and read them into or out of appropriate streams / file descriptors (examples: 1> (stdout)) (This occurs prior to starting the executable process in the command! (e.g. the cat call))
  2. Find an executable process in the list of strings.
  3. Start that executable process
  4. Pause for process completion or continue (as needed) based kinds of output detected in step 1.

This leads to some surprising logic. For instance in a new directory after executing echo "dog" > cat:

  • <cat cat >dog : writes "dog" from file cat to dog using shell tool cat

  • <cat cat> cat cat : overwrites first command, leaving a blank cat file (not sure what happens in the middle of the second command).

  • <cat cat> cat cat >dog 2>more : creates empty files dog and more, overwrites cat file with empty file.

  • <cat >dog cat cat <dog >cat (creates empty file dog, overwrites cat w/ empty file)

  • <cat cat >dog 2>much 1>more : overwrites cat w/ an empty file; creates files dog/more each of which contain the string "dog", creates empty much

(The above list behavior was tested on BASH (v4.3.46).)

Now at some point the poor shell decides it's had enough. For example, when faced with:

<cat dog> cat cat >dog >cat

It complains:

bash: dog: command not found

But there's an extra surprise -- the command actually partially completed. As in most of the above examples it's overwritten the file cat with a blank file and also created a blank file dog.

To better understand complex I/O redirection handling in the "most popular Linux shells" and CMD (the standard Windows shell):

  1. BASH (Linux)
  2. TCSH (Linux)
  3. KSH (Linux)
  4. ZSH (Linux)
  5. CMD (Windows)

... is this kind of order-ambiguous I/O redirection parsing...

  1. Supported by all of them? (I've only had time to test BASH (Linux) and cmd (Windows).)
  2. Does it support all executables supported or just core shell utilities?
  3. What are the rules used by these shells to handle sanitization/ordering of the streams/descriptors, particularly when parsing commands where the redirections appear ambiguous based upon the choice of substring (ex. stuff.dat>1test.dat<2test.dat where 1test.dat and 2test.dat are files)
  4. To what degree are their parsing rules consistent between shells?
  5. What determines failure of a command with complex I/O redirection patterns in these shells?
Community
  • 1
  • 1
Jason R. Mick
  • 4,797
  • 4
  • 36
  • 61
  • Concerning Windows `cmd`: [cmd.exe redirection operators order and position](http://stackoverflow.com/q/25559389) – aschipfl Apr 06 '17 at 18:28

2 Answers2

8

For POSIX shells -- that is, shells which attempt to implement the Posix standard -- the parsing algorithm is actually reasonably simple, and also documented in that standard. That includes bash, ksh and zsh from your list (as well as others, such as dash) but not Windows cmd. tcsh is similar but not Posix.

Redirections are not "order ambiguous". They are parsed and executed left-to-right. The only possibly odd part is that they may be arbitrarily interleaved with the command and its arguments, but since every redirection is preceded by a redirection operator, no ambiguity results.

For simple commands, the procedure is roughly:

  1. The command is split into words. Words preceded by redirection operators are redirections; those are removed from the command and saved for later processing.

    Note that the redirection operators are self-delimiting, so there is no difference whatsoever between a> b, a >b, and a>b. All of those are the word a, the redirection operator >, and the word b, and >b will be treated as a redirection. So the syntax <a> b might be confusing to a human reader (and should therefore be avoided) but it does not confuse the shell, which treats it as though it had been written in the more normal fashion as <a >b.

  2. Leading words starting ID= are assignments (where ID is anything which looks like a variable name). These are also removed for later processing. Unlike redirections, these are the only recognised until the first word, if any, which is not an assignment.

  3. The remaining words, if any, are expanded according to the expansion rules, which might involve splitting expanded words. The first word after expansion, if any, is the command, and the remaining words are command arguments.

  4. The redirections are executed, left to right. Output redirections (>foo) create or truncate the named file; append redirections (>>foo) only create the file.

  5. The assignments are expanded and applied. If there is a command, assignments are applied to the subshell environment the command will run in; otherwise, they are applied to the current shell environment.

  6. If there is a command, it is executed with the command argument words being passed to it as the argc/argv arguments.

For example, the line <cat cat> cat cat, which seems to have confused you, is parsed left-to-right as:

  • <cat, an input redirection
  • cat, a command
  • >cat, an output redirection
  • cat, an argument

which results in the redirects <cat and >cat being executed before invoking the command cat with argument cat. The first redirect (<cat) will fail if the file cat did not exist in the current directory before executing that line so the second redirect (>cat) will only be executed if the file did exist; it will immediately truncate (empty) the file. Unless the current directory is in the PATH, the command cat will be executed from the file /bin/cat, which is a different file. Since an argument is supplied to the cat command, it will not use its standard input, so the <cat redirection will have no effect other than to cause the whole command to fail unless the file cat already existed. Since the file cat will be truncated before executing the command cat cat, nothing will be written to standard output, and the file cat will remain empty.

With respect to your last questions:

  • These rules apply equally to all simple commands, whether built-in or not, aside from some details about error handling.

  • The 2 in >2foo is not special, so 2foo is a filename. FD duplication is indicated with the >& redirection operator; >&2foo is treated as an attempt to duplicate 2foo, which is invalid because 2foo is not an integer. Posix regards this as unspecified behaviour, so actual shells might do pretty well anything. See Section 2.7.5 of the Posix shell specification for details (or at least the official line).

  • A redirection could fail because of the non-existence of a file or because a file's permissions don't allow the action. As noted above, the redirections are executed left to right, which might have an effect in "complex" cases.

rici
  • 201,785
  • 23
  • 193
  • 283
  • In terms of your parsing of ` cat cat`, prior to all those commands I did `echo "dog" > cat`, so the file did exist & `cat cat` prints `dog` as expected. So why is it being truncated in `BASH`? Per my reading of your explanation, shouldn't it be reading in file `cat` to `stdin`, reading file `cat` to `stdout`? Why is it overwriting the file `cat` as empty, as the first command should read in as non-empty? Overall great explanation though, just the kind of info I was looking for, to supplement the examples in the Adv. BASH Scripting Guide. – Jason R. Mick Apr 06 '17 at 13:53
  • @jason: it's being truncated when it is opened for output by the redirect. See my point 4. As I also mentioned, `cat` does not read from stdin if you give it arguments. See `man cat` for more info. – rici Apr 06 '17 at 14:01
  • @jason: perhaps the point of confusion is that you have a model in which the command is evaluated in input order somehow, so that it has some responsibility for redirects, etc (as happens with the MS-DOS shell builtins). Unix shells do not work this way. The entire command line is parsed and an execution environment created before handing control to the command. That execution environment includes the indicated redirections, so they are not textually visible​ to the utility being executed; the utility simply reads and writes from standard streams. – rici Apr 06 '17 at 17:39
  • AHHH I see it.. `` ... I misread `cat>` as `>cat`. Makes sense now. By the time the control program executes the file is wiped out... – Jason R. Mick Apr 07 '17 at 14:52
  • @jason: `cat cat cat` would have identical effect. The redirections are removed from the command line and then first the redirections and then the remaining command words are processed left-to-right. So the precise interleaving doesn't matter; only the order of each set. – rici Apr 07 '17 at 15:00
2

Sorry, linux is not my area, but cmd is. This is a cmd limited answer, you will have to join it with more information.

Supported by all of them?

Basic redirection operators (<, >, >>, |) were included in ms-dos 2.0 (still command.com) and have been available in all versions since then.

From windows 95 (just from memory) the handle duplicate operators (>&, <&) are also available.

More exotic/non standard operators present in other shells are not available.

Does it support all executables supported or just core shell utilities?

In cmd you can request the redirection of whatever executable or internal command you want, but the result will depend on executable (in console mode or not) hability to interact with stdin/stdout/stderr.

Ex.

  • timeout.exe, a console subsytem executable does not allow input redirection

  • mshta.exe, a graphic subsytem executable allows you to use the FileSystemObject to grab a reference to StdOut and write to it

What are the rules used ...?

cmd parsing rules are simple. Left to right. If the final parsed command makes sense (not unbalanced or explictly wrong) it is executed else you have a syntax error.

i = stdin input redirection
o = stdout output redirection
e = stderr output redirection
c = command to execute
a = arguments to the command

> file1 < file2 cat
^o      ^i      ^c   

<cat cat >dog
^i   ^c  ^o

<cat cat> cat cat
^i   ^c ^o    ^a

<cat cat> cat cat >dog 2>more
^i   ^c ^o    ^a  ^o   ^e       Second output cancels & replaces first one

<cat >dog cat cat <dog >cat
^i   ^o   ^c  ^a  ^i   ^o       Second i/o set cancels & replaces fist one

<cat cat >dog 2>much 1>more     Second output cancels & replaces first one
^i   ^c  ^o   ^e     ^o

<cat dog> cat cat >dog >cat     Multiple output replacement
^i   ^c ^o    ^a  ^o   ^o

stuff.dat>1test.dat<2test.dat
^c       ^o        ^i

After the command is parsed and it is determined that there is not any syntax error, the redirections have to be created before starting the command. If there is not any problem (input files exist, output files can be written) then the appropiate handles are asigned and the program/command is started (if it exists).

To what degree are their parsing rules consistent between shells?

cmd.exe rules are consistent between windows versions, and backward compatible with the syntax used in old versions of command.com.

Just an opinion, but, why should the shells have any kind of consistency between them? If all were consistent, why to have more than one?

What determines failure of a command with complex I/O redirection patterns in these shells?

How do you determine failure? How do you determine sucess? The shell will try to do what you asked, not what you wanted. And even clear commands can behave in a priori unspected ways.

cmd parses commands and converts them into an internal representation. In this representation the data relative to the command requested is separated from the redirection information. Both the "command part" and the "redirection part" must be syntactically correct (from the parser point of view) before even starting to execute anything.

When the command is about to be executed, the redirection request is processed, adquiring the required files/handles. If everything can be established then the command is executed inside the created context.

So, failure can be

  • A syntax problem (while parsing)
  • A resource adquisition problem (while creating the redirection context, before starting command execution)
  • A rights/hardware problem (during execution of the command)
Community
  • 1
  • 1
MC ND
  • 65,671
  • 6
  • 67
  • 106
  • On consistency between shells I don't have a strong opinion -- I just was interested in the extent of the consistency give that in my tests, for the couple of examples I tried in `CMD` above it was working like `BASH`. As far as "failure" I was quantifying it as when the command produced an error that caused the line to terminate without reading / acting on all arguments, not "failure" from the perspective of producing an unexpected result (as I expected the unexpected). I get what you mean, though, the term is ambiguous. Hopefully that clarifies. – Jason R. Mick Apr 06 '17 at 13:59
  • BTW this is an incredible answer. I wish I could mark two correct answers. Thank you... very enlightening, and I think people will come across this and learn a lot. – Jason R. Mick Apr 07 '17 at 14:54