0

I am writing a small scripting engine (as part of a game engine), part of which requires the parsing of a string to get a certain object. I already have regexes to identify what the string in question is (string, variable, map or array/list), but string concatenation is where I'm stuck. It's not required, I guess i could write a concat(), but I would rather the parse method recognize string + string.

This would be simple if all arguments were strings or variables (e.g. 'i like ' + 'pie' or variable + 'string' or variable + variable), however I want to provide data types like lists/arrays and maps. Ideally, the parse method would be able to split any of the following lines by ignoring certain plus signs:

print parse("{var='value ' + ''}")
print parse("['value', x, x + y] + ' a string'")
print parse("function('arg ', 'another arg ' + 'concatenated')")

In the first print, the + is ignored completely because it is between unquoted curly braces

In the second print, the first + is ignored because it is between unquoted square brackets, but the second is not.

In the third print, similar to the first, the + is ignored because it is inside unquoted parentheses

EDIT: In case I wasn't really clear, the regex is only being used to split the text into a string array


I know that this answer is somewhat similar, and is a great start. But I've been unable to alter it to also ignore +s within (unquoted) parentheses and (()) brackets ([] or {})

In case it is important, as I know some flavors are different in each language, I am using Java.

Community
  • 1
  • 1
KILL3RTACO
  • 65
  • 8
  • Well, can you have brackets within brackets? functions within functions? – RealSkeptic Oct 02 '15 at 10:59
  • Nested arrays/maps no, I don't have a need for it, however the result of a function as an argument of another function (e.g. "func('string' + func(''))") yes. – KILL3RTACO Oct 02 '15 at 11:03
  • 3
    So you [can't use a regular expression for this](http://stackoverflow.com/a/546457/4125191). You may want to use `StreamTokenizer` to tokenize and process the input, or use a full-fledged library like Antlr. – RealSkeptic Oct 02 '15 at 11:06
  • For that specific example, the regex "\(.*\)" would've matched exactly what the OP wanted (everything within the brackets but not outside of it). I copied the original text into regexr.com, used the regex I mentioned, and the OP's expected output was highlighted. I don't want the regex to take care of the recursion, I'm doing that myself, all I need is for certain +s to be ignored so the arguments can be parsed correctly. – KILL3RTACO Oct 02 '15 at 11:18
  • Even though you only want to do one simple thing, you need to fully parse the expression to do that simple thing properly. While the "don't use regexes to parse things" objection is perhaps overdone, in this case, you really *really* are doing parsing (your function is even called parse!), and using a real parsing library will be hugely beneficial. –  Oct 02 '15 at 11:41

0 Answers0