1

I have a CLI calculator, and I am adding square root function. I have this regex that parses the users input:

string.scan(/\d*\.?\d+\^?|[-+\/*%()]|sqrt\(\d*\.?\d+\)/)

It works with these inputs as expected:

calc -o "sqrt(9)" #=> ["sqrt(9)"]
calc -o "sqrt(9) + sqrt(9)" #=> ["sqrt(9)", "+", "sqrt(9)"]

However, my regex has not accounted for nested sqrt. With this,

calc -0 "sqrt(6+3)"

I desire the output:

["sqrt(6+3)"]

because when the program finds sqrt while searching, it will simply recursively apply the scan method with the regex until it gets into the deepest nested formula and work its way back. But I get:

["(", "6", "+", "3", ")"]

I have tried to capture all but within the sqrt brackets, but it also captures everything in every other bracket as well. So I am having trouble capturing sqrt(9) and sqrt(6+3) without one messing with the other.

Any guidance is much appreciated.

UPDATE: So following on from an answer provided, perhaps I need to explain my program more so you get an idea of what is going on.

Say I have the input 2 * (3 + 5), this will be interpreted into the following array:

["2", "*", "(", "3", "+", "5", ")"]

So program conforms to PEDMAS so will first look for parenthesis, in this situation it finds them. A basic loop is basically like this:

function find_backets
   start_i, end_i
   for i in array do 
      if i == "("
         start_i = index
         find_brackets
      end
      if i == ")"
         end_i = index
         # end of nest
      end
   end

I can then pass my start and end locations within the array to a function that will iteration over each nested operation. So the above can interpret this just fine:

calc -o "2 * (6 + (2 * 2))"

#=> ["2", "*", "(", "6", "+", "(", "2", "*", "2", ")", ")"]

My idea is that when it comes across the sqrt function, it will simply just reuse the same regex used for the users input, and create a whole new array and do the above on it. Then once it's done, I take index 0 and place it where the sqrt used to be.

EDIT: So yeah didn't actually mentioned, I at to basically capture the entirely of a sqrt. So anything and everything in something like sqrt(5+5*(6/2+sqrt(9))

UPDATE: I think I have found a solution

So I done some more reading to learn how * + ? and that worked a bit more and I think (at least so far) this is working

string.scan(/\d*\.?\d+\^?|[-+\/*%()^]|sqrt\(.+?\)+|pi/)

calc -o "sqrt(9)" #=> ["sqrt(9)"] 
calc -o "sqrt(3+6)" #=> ["sqrt(3+6)"]
calc -o "sqrt(9) + sqrt(9)" #=> ["sqrt(9)", "+", "sqrt(9)"]
calc -o "sqrt(9) + 2" #=> ["sqrt(9)", "+", "2"]

Will update in a bit

Gibbo
  • 612
  • 1
  • 5
  • 20

1 Answers1

0

There's a few issues which are getting in your way: First, regex does not handle recursive searching, so you won't be able to find matching parentheses. If you're wanting to be able to accept parenthetical expressions inside of sqrt() you're going to need to attack it from a different angle (the answer there points to this algorithm).

If you're only expecting to match simple expressions inside the sqrt(), then the next problem is: in your sqrt sub-expression, you're optionally matching a literal period character \.? between digits, but you're not allowing any operators. You can approach this directly by adding a match for the operators and an optional second float into that sub-expression. In the following example, I wrapped the addition in a non-capturing group (?:_expression_) and used a * to match it 0 or more times.

sqrt\(\d*\.?\d+\) becomes sqrt\(\d*\.?\d+(?: *?[-+\/*%]? *?\d*\.?\d*)*\)

Last, you will most likely want to evaluate the contents of sqrt() before evaluating the sqrt() itself. To do this, you'll want to make use of capture groups. There are a few ways you could approach this, but one way is to have the entire expression wrapped in unescaped parentheses (capture group 1), then the contents of sqrt() should also be wrapped in unescaped parenthesis (capture group 2).

/(\d*\.?\d+\^?|[-+\/*%()]|sqrt\((\d*\.?\d+(?: *?[-+\/*%]? *?\d*\.?\d*)*)\))/

The results from your scan will be an array of capture group arrays. Running it against "sqrt(9) + sqrt(9)" will return [["sqrt(9)", "9"], ["+", nil], ["sqrt(9)", "9"]] so anytime capture group 2 is not nil, it contains the contents of a sqrt().

You can see this regex in action at Regexr

joequincy
  • 1,375
  • 10
  • 20
  • Interesting and absolutely confusing as hell. So I understand that regex does not recursively search, but what if I don't want it to? I just need the entire string, regardless of it's contents between the two brackets. So having ["sqrt(9+sqrt(9)/2*(5+5))"] Is totally fine, the program already knows how to deal with PEDMAS. In this situation it would simply do /sqrt/ on every array element, if it finds one, it can then enter a recursion to continuously perform calculations until it the array has a single element. – Gibbo Feb 14 '19 at 11:22
  • The problem is that there's no way for regex to _balance_ parenthesis. So anything that would match `sqrt(9+sqrt(9)/2*(5+5))` would also match `sqrt(9+sqrt(9)/2*(5+5)) + sqrt(9)`... unless you know in advance how deep the parenthesis nesting will go. For a complex nesting scenario like that example, you'd need to handle parenthesis algorithmically (see second link in answer), and then solve each tier and pass its solution up to its parent. Regex just isn't the right tool for that kind of problem. – joequincy Feb 14 '19 at 22:02