1

I need to define a bunch of vector sequences, which are all a series of L,D,R,U for left, down, right, up or x for break. There are optional parts, and either/or parts. I have been using my own invented system for noting it down, but I want to document this for other, potentially non-programmers to read.

I now want to use a subset (I don't plan on using any wildcards, or infinite repetition for example) of regex to define the vector sequence and a script to produce all possible matching strings...

/LDR/ produces ['LDR']
/LDU?R/ produces ['LDR','LDUR']
/R(LD|DR)U/ produces ['RLDU','RDRU']
/DxR[DL]U?RDRU?/ produces ['DxRDRDR','DxRDRDRU','DxRDURDR','DxRDURDRU','DxRLRDR','DxRLRDRU','DxRLURDR','DxRLURDRU']

Is there an existing library I can use to generate all matches?

EDIT

I realised I will only be needing or statements, as optional things can be specified by thing or nothing maybe a, or b, both optional could be (a|b|). Is there another language I could use to define what I am trying to do?

Billy Moon
  • 52,018
  • 22
  • 123
  • 222
  • You can write a parser for your own regex, then traverse through all the possibility. But the link Dukeling gave has an existing library to do the generation. – nhahtdh Feb 10 '13 at 14:04
  • @Dukeling: He has regex, and he wants to generate all strings that matches it. – nhahtdh Feb 10 '13 at 14:04
  • @billy ...just so you know, I updated my answer below -- my homegrown parser for the "?" now seems to generate the matches... Interesting challenge. – גלעד ברקן Feb 11 '13 at 06:50

3 Answers3

2

By translating the java code form the link provided by @Dukeling into javascript, I think I have solved my problem...

var Node = function(str){
    this.bracket = false;
    this.children = [];
    this.s = str;
    this.next = null;
    this.addChild = function(child){
        this.children.push(child);
    }
}

var printTree = function(root,prefix){
  prefix = prefix.replace(/\./g, "");
  for(i in root.children){
    var child = root.children[i]
    printTree(child, prefix + root.s);
  }
  if(root.children.length < 1){
    console.log(prefix + root.s);
  }
}

var Stack = function(){
    this.arr = []
    this.push = function(item){
        this.arr.push(item)
    }
    this.pop = function(){
        return this.arr.pop()
    }
    this.peek = function(){
        return this.arr[this.arr.length-1]
    }
}

var createTree = function(s){

    // this line was causing errors for `a(((b|c)d)e)f` because the `(((` was only
    // replacing the forst two brackets.
    // var s = s.replace(/(\(|\||\))(\(|\||\))/g, "$1.$2");
    // this line fixes it
    var s = s.replace(/[(|)]+/g, function(x){ return x.split('').join('.') });

    var str = s.split('');
    var stack = new Stack();
    var root = new Node("");
    stack.push(root); // start node
    var justFinishedBrackets = false;
    for(i in str){
        var c = str[i]
        if(c == '('){
            stack.peek().next = new Node("Y"); // node after brackets
            stack.peek().bracket = true; // node before brackets
        } else if (c == '|' || c == ')'){
            var last = stack.peek(); // for (ab|cd)e, remember b / d so we can add child e to it
            while (!stack.peek().bracket){ // while not node before brackets
                stack.pop();
            }
            last.addChild(stack.peek().next); // for (b|c)d, add d as child to b / c
        } else {
            if (justFinishedBrackets){
                var next = stack.pop().next;
                next.s = "" + c;
                stack.push(next);
            } else {
                var n = new Node(""+c);
                stack.peek().addChild(n);
                stack.push(n);
            }
        }
        justFinishedBrackets = (c == ')');
    }
    return root;
}

// Test it out
var str = "a(c|mo(r|l))e";
var root = createTree(str);
printTree(root, "");
// Prints: ace / amore / amole

I only changed one line, to allow more than two consecutive brackets to be handled, and left the original translation in the comments

I also added a function to return an array of results, instead of printing them...

var getTree = function(root,prefix){
  this.out = this.out || []
  prefix = prefix.replace(/\./g, "");
  for(i in root.children){
    var child = root.children[i]
    getTree(child, prefix + root.s, out);
  }
  if(root.children.length < 1){
    this.out.push(prefix + root.s);
  }
  if(!prefix && !root.s){
    var out = this.out;
    this.out = null
    return out;
  }
}

// Test it
var str = "a(b|c)d";
var root = createTree(str);
console.log(getTree(root, ""));
// logs ["abd","acd"]

The last part, to allow for empty strings too, so... (ab|c|) means ab or c or nothing, and a convenience shortcut so that ab?c is translated into a(b|)c.

var getMatches = function(str){
    str = str.replace(/(.)\?/g,"($1|)")
    // replace all instances of `(???|)` with `(???|µ)`
    // the µ will be stripped out later
    str = str.replace(/\|\)/g,"|µ)")
    // fix issues where last character is `)` by inserting token `µ`
    // which will be stripped out later
    str = str+"µ"
    var root = createTree(str);
    var res = getTree(root, "");
    // strip out token µ
    for(i in res){
        res[i] = res[i].replace(/µ/g,"")
    }
    // return the array of results
    return res
}

getMatches("a(bc|de?)?f");
// Returns: ["abcf","adef","adf","af"]

The last part is a little hack-ish as it relies on µ not being in the string (not an issue for me) and solves one bug, where a ) at the end on the input string was causing incorrect output, by inserting a µ at the end of each string, and then stripping it from the results. I would be happy for someone to suggest a better way to handle these issues, so it can work as a more general solution.

This code as it stands does everything I need. Thanks for all your help!

Billy Moon
  • 52,018
  • 22
  • 123
  • 222
  • I went to use this code again for something else, and I am starting to see some limitations to this implementation. I still don't like using the `µ` token. Should probably build array, and use strings for values and numbers for tokens. Also, after a quick googling, found some useful, related things. http://regldg.com/docs/index.php, https://github.com/bluezio/xeger, http://stackoverflow.com/questions/20080789/given-a-regular-expression-how-would-i-generate-all-strings-that-match-it, http://stackoverflow.com/questions/15950113/generate-all-valid-values-for-a-regular-expression – Billy Moon Feb 21 '14 at 01:54
1

I'd imagine what you're trying is quite easy with a tree (as long as it's only or-statements).

Parse a(b|c)d (or any or-statement) into a tree as follows: a has children b and c, b and c have a mutual child d. b and c can both consist of 0 or more nodes (as in c could be g(e|f)h in which case (part of) the tree would be a -> g -> e/f (2 nodes) -> h -> d or c could be empty, in which case (part of) the tree would be a -> d, but an actual physical empty node may simplify things, which you should see when trying to write the code).

Generation of the tree shouldn't be too difficult with either recursion or a stack.

Once you have a tree, it's trivial to recursively iterate through the whole thing and generate all strings.

Also, here is a link to a similar question, providing a library or two.

EDIT:

"shouldn't be too difficult" - okay, maybe not

Here is a somewhat complicated example (Java) that may require some advanced knowledge about stacks.

Here is a slightly simpler version (Java) thanks to inserting a special character between each ((, )), |(, etc.

Note that neither of these are particularly efficient, the point is just to get the idea across.

Community
  • 1
  • 1
Bernhard Barker
  • 50,899
  • 13
  • 85
  • 122
  • I would prefer javascript if possible, I am pretty handy with javascript, but I don't know how to write a parser (I would love to learn though). This approach seems to be the answer I am looking for - do you have any pointers how I would go about doing your steps one and two (create parse tree, traverse the tree). I am a little confused for example, how to create a tree with mutual children, and then how to traverse a tree with mutual children too. – Billy Moon Feb 10 '13 at 14:38
  • This looks great - I am sure I will be able to translate this no problem, - seems to be exactly what I need. Thanks, I will let you know how I get on. – Billy Moon Feb 10 '13 at 21:18
  • I translated it, added a couple of methods, and it is all working great - thanks! http://stackoverflow.com/a/14810830/665261 – Billy Moon Feb 11 '13 at 12:39
1

Here is a JavaScript example that addresses parsing the (a|b) and (a|b|) possibilities, creates an array of possible substrings, and composes the matches based on this answer.

var regex = /\([RLUD]*\|[RLUD]*\|?\)/, 
    str = "R(LD|DR)U(R|L|)",
    substrings = [], matches = [], str_tmp = str, find

while (find = regex.exec(str_tmp)){
  var index = find.index

  finds = find[0].split(/\|/)
  substrings.push(str_tmp.substr(0, index))

  if (find[0].match(/\|/g).length == 1) 
    substrings.push([finds[0].substr(1), finds[1].replace(/.$/, '')])
  else if (find[0].match(/\|/g).length == 2){
    substrings.push([finds[0].substr(1), ""])
    substrings.push([finds[1], ""])
  }

  str_tmp = str_tmp.substr(index + find[0].length)
}
if (str_tmp) substrings.push([str_tmp])
console.log(substrings) //>>["R", ["LD", "DR"], "U", ["R", ""], ["L", ""]]

//compose matches
function printBin(tree, soFar, iterations) {
  if (iterations == tree.length) matches.push(soFar)
  else if (tree[iterations].length == 2){
      printBin(tree, soFar + tree[iterations][0], iterations + 1)
      printBin(tree, soFar + tree[iterations][1], iterations + 1)
  }
  else printBin(tree, soFar + tree[iterations], iterations + 1)
}
printBin(substrings, "", 0)
console.log(matches) //>>["RLDURL", "RLDUR", "RLDUL", "RLDU", "RDRURL", "RDRUR", "RDRUL", "RDRU"]
Community
  • 1
  • 1
גלעד ברקן
  • 21,095
  • 3
  • 19
  • 57
  • This is pretty interesting. Seems to have a bug when you add a third question mark to the input string. Also, in my case, I need a way to express `this` or `that` so instead of `LDD?R` I could write `LD(D|)R` and use the same syntax to write `L(DL|RD)U` (expands to `LDLU` and `LRDU`). Thanks for your help. I will post the code I finally use here also. – Billy Moon Feb 11 '13 at 09:17
  • I added an answer, you might be interested... http://stackoverflow.com/a/14810830/665261 – Billy Moon Feb 11 '13 at 12:38
  • @ billy ... looks interesting, I'll learn from it. I found that small bug, but my index-tree doesn't really work for more than two "?". It was a creative effort, though, maybe I'll try to make it work. – גלעד ברקן Feb 11 '13 at 16:22
  • @BillyMoon ...I fixed it! Now the array of indexes seems to properly list all combinations for the choice nodes and, mapped to the substrings, can create the correct combinations. Duplicates can happen when different substrings combine to similar strings. Maybe i'll get to the binary tree one day... – גלעד ברקן Feb 11 '13 at 18:40
  • @BillyMoon ...Hi Billy, I promise not to bug you about this again, but please check out the function that composes the matches at the end of my updated answer. It's eight lines long and is based on eboix's brilliant answer about binary permutations. – גלעד ברקן Feb 12 '13 at 12:13
  • well, it is very concise, and interesting code, but it is failing with nested brackets like `L((D|R)|U)`. – Billy Moon Feb 12 '13 at 12:41
  • thanks for looking. I was thrilled with the simplicity of printing the binary tree, which I think your printTree follows a similar recursion. My parser would need to be reconfigured to address the nesting...I'll leave that alone for now. Cheers. – גלעד ברקן Feb 12 '13 at 19:31