5

I'm importing Excel files and extracting the formulas in these files to perform various tests and calculations with them that Excel can't do. However, to do that, I need to convert Excels functions to something that Mathematica can use. All of these formulas are very large (on average 200+ pages), so manually converting isn't an option. That brings me to my current problem. Some of the formulas I need to convert contain parentheses within the Excel functions, resulting in situations like this:

IF(IF((A2+B2)*(C1+2^D1)<>7,"A","B")="A",C8,D8)*2/IF(G17*G2=G1,IF(EXP(K9)=K10,K11,K12))*EXP(IF(H22+H23=213,A1,B1))

Here I would need to convert all the () from the IF and EXP functions to [] without changing parts like (A2+B2). The goal is to convert it to:

IF[IF[(A2+B2)*(C1+2^D1)<>7,"A","B"]="A",C8,D8]*2/IF[G17*G2=G1,IF[EXP[K9]=K10,K11,K12]]*EXP[IF[(H22+H23)=213,A1,B1]]

Is there a regex that can match these situations?

Soviras
  • 53
  • 5
  • *All of these formulas are very large (on average 200+ pages)* Seriously !! a formula more than 200 pages long. Even at a modest 1000 characters per page I'm amazed that Excel can even store, let alone use, such a formula. – High Performance Mark Sep 03 '15 at 12:11
  • 1
    One of Roman Mader's books, http://www.amazon.co.uk/Mathematica-Programmer-Stephen-Wolfram/dp/0124649904/ref=sr_1_5?s=books&ie=UTF8&qid=1441282579&sr=1-5&keywords=roman+maeder, explains how to interpret LISP expressions in Mathematica. And there are lots of examples on the 'net of implementing a DSL in Mathematica. It might be fun to write some Mathematica to interpret Excel expressions. Failing that, Mathematica's own string-handling facilities (regexes and more) are very good and surely useful in pursuit of your objective. What have you tried so far ? – High Performance Mark Sep 03 '15 at 12:19
  • 1
    relevant : http://stackoverflow.com/q/7898310/1004168 – agentp Sep 03 '15 at 18:50
  • The parenthesis are not balanced in your example ( 9 lefts and 8 rights ) – agentp Sep 03 '15 at 19:53
  • @High Performance Mark The reason excel can fit in such a large formula is because it's split over a large amount of cells. If anything changes and excel recalculates it, the program freezes for a long time, which is one of the reasons I want import it to Mathematica. I've tried to use the following regex, and variations of it: (.*?\\(.*?\\)(.?)*?)*? This resulted in some issues, where it didn't match some parts it should or matched things it shouldn't. – Soviras Sep 04 '15 at 07:31
  • @agentp: This looks useful, I'll be reading that now. Fixed the example. – Soviras Sep 04 '15 at 07:31
  • just for curiosity sake, individual excel formulas evidently are limited to about 8000 characters. – agentp Sep 04 '15 at 12:26

1 Answers1

5
string = "IF(IF((A2+B2)*(C1+2^D1)<>7,\"A\",\"B\")=\"A\",C8,D8)*2/IF(\
G17*G2=G1,IF(EXP(K9)=K10,K11,K12))*EXP(IF(H22+H23=213,A1,B1))"

What about this:

ClearAll@rectBrackets;

rectBrackets := StringReplace[ #, 
    (f_?UpperCaseQ ~~ fr : LetterCharacter ..) ~~ 
    "(" ~~ Shortest[content__] ~~ ")" /; (
        StringCount[content, ")"] == StringCount[content, "("]
    ) :> (
        f <> ToLowerCase[fr] <> "[" <> rectBrackets @ content <> "]"
    )
] &;

operators = StringReplace[#, {"=" -> "==", "<>" -> "!="}] &;

And now let's use them:

rectBrackets @ operators @ string
"If[If[(A2+B2)*(C1+2^D1)!=7,\"A\",\"B\"]==\"A\",C8,D8]*2/If[G17*\
G2==G1,If[Exp[K9]==K10,K11,K12]]*Exp[If[H22+H23==213,A1,B1]]"
 ToExpression @ %

enter image description here

Kuba
  • 761
  • 7
  • 14
  • @Soviras Great, but keep in mind that this is very simple example. It may fail when you have "(" or ")" in strings inside checks so that the balance check will fail. More bulletproof patter is needed in general. But this is nice enough to share :) – Kuba Sep 04 '15 at 08:25