0

I have looked through the suggested "already answered" questions for this. Mostly they want simply to discard such "non-printable" input. I want to use it.

I am getting a UTF8 String returned from keyboard input using

BufferedReader br = new BufferedReader( new InputStreamReader(System.in, 'UTF-8' ));
String response = br.readLine();

and I am interested in identifying whether the user has input, for example, up-arrow or down-arrow as one of their keystrokes.

Iterating through the chars in this String I find that down-arrow translates to (int value for char) 27, 91, 66, i.e. 3 chars. The first value corresponds to Escape. It seems therefore that this is not a matter of identifying a single Character and finding out whether it is non-printable.

Also I'm not clear why this control character can't be printed out as a single UTF8 character, but instead prints out as the 3 component parts of the UTF8 character: does this mean that when you iterate through a String you are in fact getting its contents byte-by-byte?

I just wonder if there is any documented or clever way of doing this (finding and identifying control characters) in a given UTF8 String. Perhaps Apache Commons. Or perhaps in Groovy (which I am in fact using, rather than Java)?

mike rodent
  • 10,479
  • 10
  • 80
  • 104
  • up-arrow is not a control character, it's just a keyboard key. The effect of pressing it will depend on whatever console you happen to be using, most common being no effect whatsoever. It seems you happen to be testing using a console that will produce a sequence of characters that starts with an escape control char, when you press an arrow. This behavior comes from your console of choice and is entirely unrelated from Java. I suggest you experiment with what your console does. – kumesana Mar 21 '19 at 12:39
  • OK thanks. I'm using a Cygwin BASH console on a Windows 10 machine. I didn't realise this was console-specific. Given that Cygwin is meant to emulate Linux I wonder if I'd get the same sequence on a Linux Terminal. At the moment I have no access to a Linux OS but will do some experiments. – mike rodent Mar 21 '19 at 12:51
  • It seems to me that what you're observing is consistent with what you obtain in a typical linux console. I don't have one on the ready either. – kumesana Mar 21 '19 at 12:52
  • It's really up to the user what terminal they choose to use and what line-editing capabilities it supports. Generally, they should be able to enter whatever keystrokes they want and then press the enter key when they are satisfied that the screen is displaying what they what to enter as the next line. Mistakes are between them and their terminal. – Tom Blodget Mar 21 '19 at 22:39
  • @TomB Thanks. Yes, this was actually with a view to emulating some of the stuff you get with a *nix prompt: e.g. up-arrow -> type out the previous command. In fact there's a guy who's developed a good way of "grabbing" each character before Enter is pressed, see here: https://stackoverflow.com/a/30008252/595305 ... unfortunately, although it works with W10 and presumably with Linux, it doesn't appear to work with Cygwin. – mike rodent Mar 22 '19 at 17:55

1 Answers1

2

You can test for a real control character using the Character::isISOControl methods (javadoc).

However, as noted in the comments, up-arrow and down-arrow are keystrokes rather than characters. What they actually produce in the input stream are platform dependent. For example, if you are using an ANSI-compliant terminal or terminal emulator, an up-arrow will be mapped to the sequence ESC [ A. If you simply filter out the ISO control characters, you will remove the ESC only.

I don't think there is a reliable platform independent way to filter out the junk that results from a user mistakenly typing arrow keys. For a platform specific solution, you need to understand what specific sequences are produced by the user's input device. Then you detect and remove the sequences.

Stephen C
  • 632,615
  • 86
  • 730
  • 1,096
  • Thanks. I don't suppose a program can somehow interrogate a terminal to find out what sort of platform it is? i.e. ANSI-compliant or something else? – mike rodent Mar 21 '19 at 14:03
  • There might be as heuristic way. AFAIK, there is not standard way. – Stephen C Mar 21 '19 at 22:06
  • I suppose `uname` might help, which can help you distinguish between Linux, Mac and Cygwin (https://stackoverflow.com/questions/3466166/how-to-check-if-running-in-cygwin-mac-or-linux)... oddly, typing `uname` in a W10 DOS console gives "Cygwin" in my machine. Hmm. – mike rodent Mar 22 '19 at 17:59
  • Actually, it doesn't help a lot. Identifying the local operating system doesn't tell you the characteristics of either a terminal attached to an RS232 line, or a terminal emulator running on a remote machine over (say) an SSH connection. Indeed, even a local "terminal emulator" application may have options to switch between different styles terminal emulation, emitting different sequences for arrow keys. – Stephen C Mar 23 '19 at 05:03
  • Got it. Oh dear. Oh well. Strange in a way: i.e. that your program simply can't find out how mapping of keystrokes to byte/char sequences happens for a given I/O channel. I've got a program which uses colour to highlight words... it works in *nix but in W10 DOS you get strange typing like "{escape}[032m" (should be "switch to green text"). – mike rodent Mar 23 '19 at 20:27
  • To get a Windows console to handle VT100-like sequences (like a typical Linux console) you need to have the `ENABLE_VIRTUAL_TERMINAL_PROCESSING` console mode flag set; see https://docs.microsoft.com/en-us/windows/console/getconsolemode – Stephen C May 07 '19 at 03:53