7

What's the recommended way to parse a shell-like command line in Java. By that I don't mean processing the options when they are already in array form (e.g. handling "-x" and such), there are loads of questions and answers about that already.

No, I mean the splitting of a full command string into "tokens". I need to convert a string such as:

user 123712378 suspend "They are \"bad guys\"" Or\ are\ they?

...to the list/array:

user
123712378
suspend
They are "bad guys"
Or are they?

I'm currently just doing a split on whitespace, but that obviously can't handle the quotes and escaped spaces.

(Quote handling is most important. Escaped spaces would be nice-to-have)

Note: My command string is the input from a shell-like web interface. It's not built from main(String[] args)

Bart van Heukelom
  • 40,403
  • 57
  • 174
  • 291
  • Wouldn't most of those command line libraries have solved this problem? You could just look at their source. – Daniel Kaplan May 23 '13 at 19:34
  • 1
    @tieTYT As far as I know those libraries only deal with the arguments once they have been separated by the shell. They are for building commands, not shells. – Bart van Heukelom May 23 '13 at 19:47
  • How would you split the following?: `This is"an example"`. I.e. how would you treat an opening quotation mark preceded by a non-white space character? – Lone nebula May 23 '13 at 19:49
  • @Lonenebula "This","is","an example" – Bart van Heukelom May 23 '13 at 19:54
  • Assuming you want the same rules as the shell uses, **this is"an example"** would parse into **this** and **isan example** – Edward Falk Nov 13 '13 at 19:39
  • I'd also need some handy lib for this, something like 'shellwords' are for Ruby (http://www.ruby-doc.org/stdlib-1.9.3/libdoc/shellwords/rdoc/Shellwords.html ) packaged in the default distribution. – inger Oct 27 '14 at 23:49
  • Possible duplicate of [Split a string containing command-line parameters into a String\[\] in Java](https://stackoverflow.com/questions/3259143/split-a-string-containing-command-line-parameters-into-a-string-in-java) – tsh Apr 13 '18 at 07:58

3 Answers3

0

What you would need is to implement a finite automaton. You would need to read the string character by character and find the next state depending on your next or previous character.
For example a " indicates start of a string but if it is preceded by an \ leaves the current state unchanged and reads until the next token that takes you to the next state.
I.e. essentially in your example you would have

read string -> read number   
      ^  -    -   -  |  

You of course would need to define all the states and the special characters that affect or not affect your state.
To be honest I am not sure why you would want to provide such functionality to the end user.
Traditionally all the cli programs accept input in a standard format -x or --x or --x=s etc.
This format is well known to a typical user and is simple to implement and test as correct.
Traditionally if we are required to provide more "flexible" input for the user, it is best to build a GUI. That is what I would suggest.

Cratylus
  • 49,824
  • 60
  • 195
  • 327
  • 3
    Well, that's pretty easy to build (and I have in the past, for other situations), but I was hoping some library already fixed it. Regarding the reason, I'm not juist building the commands themselves (that's where -x and such come in, and they might well do in my application), but I'm first building the shell itself, where power users enter command strings via a web interface. – Bart van Heukelom May 23 '13 at 19:49
  • Perhaps there is a library for what you want, but I am not aware of one to suggest.But if you are going to build such a shell and the implementation is easy for you to build as you say, I would recommend to build it, than depend on another library as you would build to add things as needed (more/less functionality, debugging etc) – Cratylus May 23 '13 at 20:12
0

ArgumentTokenizer from DrJava parses command line in a way Bourne shell and its derivatives do.

It properly supports escapes, so bash -c 'echo "\"escaped '\''single'\'' quote\""' gets tokenized into [bash, -c, echo "\"escaped 'single' quote\""].

nvamelichev
  • 378
  • 3
  • 12
-1

Build the args[] back into a string, then tokenize using regexp:

public static void main(String[] args) {
    String commandline = "";
    for(String arg : args) {
        commandline += arg;
        commandline += " ";
    }
    System.out.println(commandline);

    List<String> list = new ArrayList<String>();
    Matcher m = Pattern.compile("([^\"]\\S*|\".+?\")\\s*").matcher(commandline);
    while (m.find())
        list.add(m.group(1)); // Add .replace("\"", "") to remove surrounding quotes.


    System.out.println(list);
}

The latter part I took from here.

Community
  • 1
  • 1
ashatch
  • 270
  • 1
  • 10