4

Hello the smart community,

I have done some research around this issue and couldn't seem to find the answer to my exact problem

I am facing a strange compile-time issue with Java String literals that contain unicode escape codes.

Here is the code snippet under consideration:

    String text = textArea.getText().trim();
    String unicodeReturn = "\u000A";
    text = 
            "\"" + 
            text
            .replace(" ", "%s")
            .replace("\\", "\\\\")
            .replace("\"", "\\\"")
            .replace("\n", "\u000A") 
            + 
            "\"";

I get compile-time error "String literal is not properly closed by a double-quote" for the line

    String unicodeReturn = "\u000A";

Strangely, the line

    .replace("\n", "\u000A") 

where the same unicode literal exists, doesn't seem to cause any issues. I have been using unicode notation syntax for quite some time now. If my memory is not failing me the format is \uXXXX, where X is a hex digit.

My environment is

  • JDK 1.8.0_66
  • MACOSX El Capitan
  • Eclipse Mars.1

My questions are:

  1. Has anyone come across the same issue?
  2. Is this a known JDK 1.8 compiler bug?
  3. Is there a solution or a workaround to this?
  4. Does anyone know if I am doing something wrong?

(It is quite frustrating and prevents me from compiling my code)

Nar Gar
  • 2,491
  • 2
  • 23
  • 27
  • 3
    http://stackoverflow.com/questions/30727515/why-is-executing-java-code-in-comments-with-certain-unicode-characters-allowed – ZhongYu Feb 10 '16 at 02:54

1 Answers1

3

The \u unicodes are converted before parsing the text so

String unicodeReturn = "\u000A";

is the same as

String unicodeReturn = "
";

Here is an example of a Hello World program consisting entirely of \u codes https://stackoverflow.com/a/30727799/57695

\u0070\u0075\u0062\u006c\u0069\u0063\u0020\u0020\u0020\u0020
\u0063\u006c\u0061\u0073\u0073\u0020\u0055\u0067\u006c\u0079
\u007b\u0070\u0075\u0062\u006c\u0069\u0063\u0020\u0020\u0020
\u0020\u0020\u0020\u0020\u0073\u0074\u0061\u0074\u0069\u0063
\u0076\u006f\u0069\u0064\u0020\u006d\u0061\u0069\u006e\u0028
\u0053\u0074\u0072\u0069\u006e\u0067\u005b\u005d\u0020\u0020
\u0020\u0020\u0020\u0020\u0061\u0072\u0067\u0073\u0029\u007b
\u0053\u0079\u0073\u0074\u0065\u006d\u002e\u006f\u0075\u0074
\u002e\u0070\u0072\u0069\u006e\u0074\u006c\u006e\u0028\u0020
\u0022\u0048\u0065\u006c\u006c\u006f\u0020\u0077\u0022\u002b
\u0022\u006f\u0072\u006c\u0064\u0022\u0029\u003b\u007d\u007d

Instead you could do

.replace("\n", "\\u000A")

or just

.replace("\n", "\\n")
Community
  • 1
  • 1
Peter Lawrey
  • 498,481
  • 72
  • 700
  • 1,075
  • Thanks @PeterLawrey, it seems that \u000A does have some weird effect on the source code, that effectively prevents the programmer from using the unicode new line. Unfortunately, \\u000A is not an option as it will translate into "\u000A" character sequence rather than the carriage return symbol. Interesting aspect of this issue is that my second code line did not cause compiler error, only the first line. Would you know why? – Nar Gar Feb 10 '16 at 03:04
  • 1
    @NarGar You already have the carriage return symbol `\n` I did explain why you can use `\u000a` inside a string so it's not weird at all. Your second line cases an error for me as well, I suggest your compiler just doesn't show you the error because of the previous one. – Peter Lawrey Feb 10 '16 at 03:08
  • my use case is different, the string I am constructing is meant to be a shell command for adb shell, where I cannot have \n. Not quite sure if \u000A is the solution either though but I am doing some trial and error. If you know of a good way of having carriage return representation in adb shell command, I'd be glad to learn that approach. – Nar Gar Feb 10 '16 at 03:37
  • @NarGar it is not clear to me that you need to do anything. adb handles \n AFAIK. – Peter Lawrey Feb 10 '16 at 12:04
  • 1
    The adb shell input text command does not handle \n for me. Have tried a variety of things. I am using Android SDK 23 – Nar Gar Feb 10 '16 at 22:58
  • @NarGar so you need to know what to translate `\n` into as `\u000A` is much the same thing. – Peter Lawrey Feb 10 '16 at 23:48