Since Han mentioned in his comments that he would like to wait for further ideas, I will show an additional solution.
And as everybody before, I think it is the most appropriate solution :-)
Additionally, I will unpack the "big hammer", and talk about "languages" and "grammars" and, uh oh, Chomsky Hierachy.
First a very simple answer: Pure Regular Expressions cannot count. So, they cannot check matching braces, like 3 open braces and 3 closed Braces.
They are mostly implemented as DFA (Deterministic Finite Automaton), also known as FSA (Finite State Automaton). One of the relevant properties here is that they do know only about their current state. They cannot "remember" previous states. They have no memory.
The languages that they can produce are so-called "regular languages". In the Chomsky hierarchy, the grammar to produce such a regular language is of Type-3. And “regular expressions” can be used to produce such languages.
However, there are extensions to regular expressions that can also be used to match balanced braces. See here: Regular expression to match balanced parentheses
But these are not regular expression as per the original definition.
What we really need, is a Chomsky-Type-2 grammar. A so-called context-free-grammar. And this will usually be implemented with a pushdown-automaton. A stack is used to store additional state. This is the “memory” that regular expressions do not have.
So, if we want to check the syntax of a given expression, as in your case the input for a std::map, we can define an ultra-simple Grammar and parse the input string using the standard classical approach: A Shift/Reduce Parser.
There are several steps necessary: First the input stream will be split into Lexems od Tokens. This is usually done by a so called Lexer or Scanner. You will always find a function like getNextToken or similar. Then the Tokens will be shifted on the stack. The Stack Top will be matched against productions in the grammar. If there is a match with the right side of the production, the elements in the stack will be replaced by the none-terminal on the left side of the productions. This procedure will be repeated until the start symbol of the grammar will be hit (meaning everything was OK) or a syntax error will be found.
Regarding your question:
How to parse a string into std::map and validate its format?
I would split it in to 2 tasks.
- Parse the string to validate the format
- If the string is valid, put the data into a map
Task 2 is simple and typically a one-liner using a std::istream_iterator.
Task 1 unfortunately needs a shift-reduce-parser. This is a little bit complex.
In the attached code below, I show one possible solution. Please note: This can of cause be optimized by using Token with attributes. The attributes would be an integer number and the type of the brace. The Token with attributes would be stored on the parse stack. With that we could eliminate the need to have productions for all kind of braces and we could fill the map in the parser (in the reduction operation of one of “{Token::Pair, { Token::B1open, Token::Integer, Token::Comma, Token::Integer, Token::B1close} }”
Please see the code below:
#include <iostream>
#include <iterator>
#include <sstream>
#include <map>
#include <vector>
#include <algorithm>
// Tokens: Terminals and None-Terminals
enum class Token { Pair, PairList, End, OK, Integer, Comma, B1open, B1close, B2open, B2close, B3open, B3close };
// Production type for Grammar
struct Production { Token nonTerminal; std::vector<Token> rightSide; };
// The Context Free Grammar CFG
std::vector<Production> grammar
{
{Token::OK, { Token::B1open, Token::PairList, Token::B1close } },
{Token::OK, { Token::B2open, Token::PairList, Token::B2close } },
{Token::OK, { Token::B3open, Token::PairList, Token::B3close } },
{Token::PairList, { Token::PairList, Token::Comma, Token::Pair} },
{Token::PairList, { Token::Pair } },
{Token::Pair, { Token::B1open, Token::Integer, Token::Comma, Token::Integer, Token::B1close} },
{Token::Pair, { Token::B2open, Token::Integer, Token::Comma, Token::Integer, Token::B2close} },
{Token::Pair, { Token::B3open, Token::Integer, Token::Comma, Token::Integer, Token::B3close} }
};
// Helper for translating brace characters to Tokens
std::map<const char, Token> braceToToken{
{'(',Token::B1open},{'[',Token::B2open},{'{',Token::B3open},{')',Token::B1close},{']',Token::B2close},{'}',Token::B3close},
};
// A classical SHIFT - REDUCE Parser
class Parser
{
public:
Parser() : parseString(), parseStringPos(parseString.begin()) {}
bool parse(const std::string& inputString);
protected:
// String to be parsed
std::string parseString{}; std::string::iterator parseStringPos{}; // Iterator for input string
// The parse stack for the Shift Reduce Parser
std::vector<Token> parseStack{};
// Parser Step 1: LEXER (lexical analysis / scanner)
Token getNextToken();
// Parser Step 2: SHIFT
void shift(Token token) { parseStack.push_back(token); }
// Parser Step 3: MATCH / REDUCE
bool matchAndReduce();
};
bool Parser::parse(const std::string& inputString)
{
parseString = inputString; parseStringPos = parseString.begin(); parseStack.clear();
Token token{ Token::End };
do // Read tokens untils end of string
{
token = getNextToken(); // Parser Step 1: LEXER (lexical analysis / scanner)
shift(token); // Parser Step 2: SHIFT
while (matchAndReduce()) // Parser Step 3: MATCH / REDUCE
; // Empty body
} while (token != Token::End); // Do until end of string reached
return (!parseStack.empty() && parseStack[0] == Token::OK);
}
Token Parser::getNextToken()
{
Token token{ Token::End };
// Eat all white spaces
while ((parseStringPos != parseString.end()) && std::isspace(static_cast<int>(*parseStringPos))) {
++parseStringPos;
}
// Check for end of string
if (parseStringPos == parseString.end()) {
token = Token::End;
}
// Handle digits
else if (std::isdigit(static_cast<int>(*parseStringPos))) {
while ((((parseStringPos + 1) != parseString.end()) && std::isdigit(static_cast<int>(*(parseStringPos + 1))))) ++parseStringPos;
token = Token::Integer;
}
// Detect a comma
else if (*parseStringPos == ',') {
token = Token::Comma;
// Else search for all kind of braces
}
else {
std::map<const char, Token>::iterator foundBrace = braceToToken.find(*parseStringPos);
if (foundBrace != braceToToken.end()) token = foundBrace->second;
}
// In next function invocation the next string element will be checked
if (parseStringPos != parseString.end())
++parseStringPos;
return token;
}
bool Parser::matchAndReduce()
{
bool result{ false };
// Iterate over all productions in the grammar
for (const Production& production : grammar) {
if (production.rightSide.size() <= parseStack.size()) {
// If enough elements on the stack, match the top of the stack with a production
if (std::equal(production.rightSide.begin(), production.rightSide.end(), parseStack.end() - production.rightSide.size())) {
// Found production: Reduce
parseStack.resize(parseStack.size() - production.rightSide.size());
// Replace right side of production with left side
parseStack.push_back(production.nonTerminal);
result = true;
break;
}
}
}
return result;
}
using IntMap = std::map<int, int>;
using IntPair = std::pair<int, int>;
namespace std {
istream& operator >> (istream& is, IntPair& intPair) {
return is >> intPair.first >> intPair.second;
}
ostream& operator << (ostream& os, const pair<const int, int>& intPair) {
return os << intPair.first << " --> " << intPair.second;
}
}
int main()
{ // Test Data. Test Vector with different strings to test
std::vector <std::string> testVector{
"({10, 1 1}, (2, 3) , [5 ,6])",
"({10, 1}, (2, 3) , [5 ,6])",
"({10, 1})",
"{10,1}"
};
// Define the Parser
Parser parser{};
for (std::string& test : testVector)
{ // Give some nice info to the user
std::cout << "\nChecking '" << test << "'\n";
// Parse the test string and test, if it is valid
bool inputStringIsValid = parser.parse(test);
if (inputStringIsValid) { // String is valid. Delete everything but digits
std::replace_if(test.begin(), test.end(), [](const char c) {return !std::isdigit(static_cast<int>(c)); }, ' ');
std::istringstream iss(test); // Copy string with digits int a istringstream, so that we can read with istream_iterator
IntMap intMap{ std::istream_iterator<IntPair>(iss),std::istream_iterator<IntPair>() };
// Present the resulting data in the map to the user
std::copy(intMap.begin(), intMap.end(), std::ostream_iterator<IntPair>(std::cout, "\n"));
} else {
std::cerr << "***** Invalid input data\n";
}
}
return 0;
}
I hope this is not too complex. But it is the "mathematical" correct solution. Have fun . . .