0

I tried command sed 's/$/\r/g' linux.txt > linux2win.txt to convert the text file from Linux to Windows.

And it works! all \n are converted to \r\n

for example, hello, world \n is converted to hello, world \r\n

What confuses me is that what exactly $ refers to? \n ? or an empty char before \n? I don't even know what I replaced.

PYL
  • 99
  • 3
  • 9
  • This might help: [The Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/3776858) – Cyrus Jan 04 '17 at 13:51
  • That command does the opposite of what you're saying. – melpomene Jan 04 '17 at 13:52
  • 1
    Possible duplicate of [Dollar sign in regular expression and new line character](http://stackoverflow.com/questions/13912373/dollar-sign-in-regular-expression-and-new-line-character) – Cyrus Jan 04 '17 at 13:53
  • See [here](https://www.gnu.org/software/sed/manual/sed.html#BRE-vs-ERE), [here](https://www.gnu.org/software/sed/manual/sed.html#regexp-extensions) and last paragraph of [here](https://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-Command). The last two references only for GNU sed. – potong Jan 04 '17 at 15:09

3 Answers3

1

The answers/comments so far stating that $ matches the end of line are misleading. $ in a regexp matches end of string, that is all. The reason it appears to match end of line in sed is that by default sed reads 1 line at a time so in that context (but not in others) each string it's operating on does end at the end of the line.

So $ matches end-of-string and if your string ends at the end of a line then $ matches at the end of the line but if your string contains multiple lines (e.g. in sed you can create a multi-line string stored in a buffer) then $ does not match at the end of any given line, it simply and consistently matches at the end of the string.

Similarly ^ matches start-of-string, btw, not start-of-line as you may hear people claim.

wrt your comment:

my original line is hello, world \n$ and $ is invisible , and $ is replaced by \r, now my line is hello, world\n\r$ .`

No, that is not what is happening. Your original line is:

hello, world\n

and sed reads one \n-separated line at a time so what is read into seds buffer is the string:

hello, world

Now $ is a regexp metacharacter that matches the end-of-string so given the above string $ will match after the d (and ^ would match before h) so when you do

s/$/\r/

It changes the above string to:

hello world\r

and then when sed prints it out it adds back the newline (because a string with no terminating newline is not a text line per POSIX) and to outputs:

hello world\r\n

Note that $ is never part of the string, it's just a metacharacter that when used in a regexp matches the end of the string so you can test for characters appearing just at the end of a string or do other operations (like the above) after the end of the string.

Ed Morton
  • 157,421
  • 15
  • 62
  • 152
  • 1
    Thanks! This answer really helps me! I didn't know that sed would consume one `\n`. Without your answer maybe I will misunderstand `$` in regexp for a long time. – PYL Jan 04 '17 at 15:05
0

$ matches the end of line, so the command:

sed 's/$/\r/g'

simply adds \r to the end of line, which is not what you say. If the input is "hello, world \r\n", the output would be "hello, world \r\n".

Maroun
  • 87,488
  • 26
  • 172
  • 226
  • @PYL The command simply "appends" `\r` to the end of line (it actually replaces the "last place" with a `\r`). – Maroun Jan 04 '17 at 14:01
  • I think of it in this way: my original line is hello, world \n$ and $ is invisible , and $ is replaced by \r, now my line is hello, world\n\r$ . It is weird ,isn't it? – PYL Jan 04 '17 at 14:15
  • 1
    wrt `If the input is "hello, world \r\n", the output would be "hello, world \r\n"` - that depends on the environment you're running in and whether or not the underlying C primitives allow the `\r` from the input to get through to `sed`. If you ran on cygwin with sed in binary mode, for example, then the output would be `"hello, world \r\r\n"` – Ed Morton Jan 04 '17 at 15:02
0

The premise of your question is flawed. The sed command you present converts Linux-style line terminators (newline alone) to Windows-style (carriage-return / newline), not the other way around.

It works like this:

  • the $ is a regex metacharacter that matches the zero-width end of the line (i.e. just prior to the line terminator, if any).
  • the substitution string is a carriage return character (expressed as \r); it replaces the zero-width character sequence matched by the regex, in effect inserting the carriage return immediately before the newline

The trailing g in the sed command specifies that all matches in each line should be replaced; it is superfluous because the cannot be more than one match per line.

Note also that this can be slightly quirky: if the input file does not end with a newline, then the output will end with just \r, because the end of the file is then the end of the last line.

John Bollinger
  • 121,924
  • 8
  • 64
  • 118
  • I think of it in this way: my original line is hello, world \n$ and $ is invisible , and $ is replaced by \r, now my line is hello, world\n\r$ . It is weird ,isn't it? – PYL Jan 04 '17 at 14:14
  • No, @PYL, it is not weird at all. The newlines in the input file are not considered *part of* the lines by `sed`. They are line terminators -- the line ends just before, and the next line starts just after. `sed` consumes line terminators on input, and (by default) introduces new ones on output. You can get newlines into `sed`'s pattern space by other means, but you do not get them from simply reading a line into the pattern space. – John Bollinger Jan 04 '17 at 14:57
  • Thanks! Helps me a lot! – PYL Jan 04 '17 at 15:09