0

I found a project done a few years ago found here that does some simple command line parsing. While I really like it's functionality, it does not support parsing special characters, such as <, >, &, etc. I went ahead and attempted to add some functionality to parse these characters specifically by adding some of the same conditions that the existing code used to look for whitespace, escape characters, and quotes:

bool _isQuote(char c) {
    if (c == '\"')
            return true;
    else if (c == '\'')
            return true;

    return false;
}

bool _isEscape(char c) {
    if (c == '\\')
        return true;

    return false;
}

bool _isWhitespace(char c) {
    if (c == ' ')
        return true;
    else if(c == '\t')
        return true;

    return false;
}
.
.
.

What I added:

bool _isLeftCarrot(char c) {
    if (c == '<')
        return true;

    return false;
}

bool _isRightCarrot(char c) {
    if (c == '>')
        return true;

    return false;
}

and so on for the rest of the special characters.

I also tried the same approach as the existing code in the parse method:

std::list<string> parse(const std::string& args) {

    std::stringstream ain(args);            // iterates over the input string
    ain >> std::noskipws;                   // ensures not to skip whitespace
    std::list<std::string> oargs;           // list of strings where we will store the tokens

    std::stringstream currentArg("");
    currentArg >> std::noskipws;

    // current state
    enum State {
            InArg,          // scanning the string currently
            InArgQuote,     // scanning the string that started with a quote currently 
            OutOfArg        // not scanning the string currently
    };
    State currentState = OutOfArg;

    char currentQuoteChar = '\0';   // used to differentiate between ' and "
                                    // ex. "sample'text" 

    char c;
    std::stringstream ss;
    std::string s;
    // iterate character by character through input string
    while(!ain.eof() && (ain >> c)) {

            // if current character is a quote
            if(_isQuote(c)) {
                    switch(currentState) {
                            case OutOfArg:
                                    currentArg.str(std::string());
                            case InArg:
                                    currentState = InArgQuote;
                                    currentQuoteChar = c;
                                    break;
                            case InArgQuote:
                                    if (c == currentQuoteChar)
                                            currentState = InArg;
                                    else
                                            currentArg << c;
                                    break;
                    }
            }
            // if current character is whitespace
            else if (_isWhitespace(c)) {
                        switch(currentState) {
                            case InArg:
                                    oargs.push_back(currentArg.str());
                                    currentState = OutOfArg;
                                    break;
                            case InArgQuote:
                                    currentArg << c;
                                    break;
                            case OutOfArg:
                                    // nothing
                                    break;
                    }
            }
            // if current character is escape character
            else if (_isEscape(c)) {
                    switch(currentState) {
                            case OutOfArg:
                                    currentArg.str(std::string());
                                    currentState = InArg;
                            case InArg:
                            case InArgQuote:
                                    if (ain.eof())
                                    {
                                            currentArg << c;
                                            throw(std::runtime_error("Found Escape Character at end of file."));
                                    }
                                    else {
                                            char c1 = c;
                                            ain >> c;
                                            if (c != '\"')
                                                    currentArg << c1;
                                            ain.unget();
                                            ain >> c;
                                            currentArg << c;
                                    }
                                    break;
                    }
            }

What I added in the parse method:

            // if current character is left carrot (<)
            else if(_isLeftCarrot(c)) {
                    // convert from char to string and push onto list
                    ss << c;
                    ss >> s;
                    oargs.push_back(s);
            }
            // if current character is right carrot (>)
            else if(_isRightCarrot(c)) {
                    ss << c;
                    ss >> s;
                    oargs.push_back(s);
            }
.
.
.
            else {
                    switch(currentState) {
                            case InArg:
                            case InArgQuote:
                                    currentArg << c;
                                    break;
                            case OutOfArg:
                                    currentArg.str(std::string());
                                    currentArg << c;
                                    currentState = InArg;
                                    break;
                    }
            }
    }

    if (currentState == InArg) {
            oargs.push_back(currentArg.str());
            s.clear();
    }
    else if (currentState == InArgQuote)
            throw(std::runtime_error("Starting quote has no ending quote."));

    return oargs;
}

parse will return a list of strings of the tokens.

However, I am running into issues with a specific test case when the special character is attached to the end of the input. For example, the input

foo-bar&

will return this list: [{&},{foo-bar}] instead of what I want: [{foo-bar},{&}]

I'm struggling to fix this issue. I am new to C++ so any advice along with some explanation would be great help.

Lightness Races in Orbit
  • 358,771
  • 68
  • 593
  • 989
M. Twain
  • 3
  • 1
  • 2
    Unrelated: *carrot* -> *caret*. Strictly speaking caret only goes up. What you have are greater than and less than – user4581301 Jan 29 '19 at 01:01
  • 1
    Interesting fun fact: `while(!ain.eof() && (ain >> c))` could be `while(ain >> c)` The state of the iostream will be just as failed if `>>` reads after EOF is found as it will be if anything else prevents conversion to a `char`. Not that much else prevents conversion to a `char`. In general, testing for EOF before reading and (and possibly finding EOF) is such a bad idea that it has [its own page with a several hundred upvotes.](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong) – user4581301 Jan 29 '19 at 01:05
  • @user4581301 Actually it's more likely that _"carot"_ was intended (a valid name for these symbols). Greater-than and less-than are the _semantics_ of _operators_ in the language and in mathematics but arguably not names for the glyphs themselves. – Lightness Races in Orbit Jan 29 '19 at 01:09
  • Also keep an eye out for [What are the rules about using an underscore in a C++ identifier?](https://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier). If `_isQuote` is at global scope you may some day find yourself with a very, very nasty surprise and a few pages of utterly inscrutable error messages. – user4581301 Jan 29 '19 at 01:10
  • @LightnessRacesinOrbit Good point, but when I went for a quick look to see what their names really are, I found nothing better than angle bracket and went with the easiest target. They have to have a name, we hairless apes name everything, I just can't find it and I wanted to race back and scoop up all the sweet adulation you get from making unrelated comments. – user4581301 Jan 29 '19 at 01:13
  • Ah. There we go: Chevrons. Or maybe not. Chevrons seem to have a different angle. – user4581301 Jan 29 '19 at 01:14
  • @user4581301 They are guillemots, or carots, or angle brackets. My point was that although you pointed out a caret is a different thing, that's one of the reasons the OP probably _didn't_ mean caret ;) – Lightness Races in Orbit Jan 29 '19 at 10:56
  • @lightness: guillemets. Guillemots are [birds](https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSg1-aeES4UZNBeknoU4jAeCwJPcnh5X7l_TG2PWYuMA_hEAmUJ5vNirusK). In French, guillemets means quotation marks, and in French typography they're doubled so you'd have to explicitly say "single" if you meant just one. I think "chevron" is probably the best. – rici Jan 29 '19 at 17:19
  • @rici [Single guillemets exist too](https://en.wikipedia.org/wiki/Guillemet). Angle bracket or carot is the best term though. [Chevrons are not (quite) the same](https://en.wikipedia.org/wiki/Bracket#Angle_brackets). – Lightness Races in Orbit Jan 29 '19 at 17:28
  • @lightness: I'm fine with "angle bracket". The only place I can find "carot" is the Wikipedia entry on guillemets where it lists it as an "informal" synonym, but I suspect it's a typo or possibly a misspelling on its way to becoming a neologism. – rici Jan 29 '19 at 17:41
  • @rici: Granted; there's a historical explanation of "carot" at the end of the Wikipedia article but, while it puts forth a believable etymology, it's poorly written and completely unsourced. Furthermore it suggests a wide recognition in the software community, which I cannot corroborate (and, it seems, neither can you!) :P – Lightness Races in Orbit Jan 29 '19 at 17:46

1 Answers1

0

When you handle one of your characters, you need to do the same sorts of things that the original code does when it encounters a space. You need to look at the currentState, then save the current argument if you are in the middle of one (and reset it since you no longer are in one).

1201ProgramAlarm
  • 30,320
  • 7
  • 40
  • 49