3

I need to declare a string for use as a Regular Expression pattern.

The string is: (?<="[a-zA-Z0-9.-]*\d{8}.xml(?=")

Usually to declare a string in VBA for use in Reg Exp you enclose in double quotes so it would look like this: "(?<="[a-zA-Z0-9.-]*\d{8}.xml(?=")" but that results in a VBA Compile Error: Expected: end of statement with the [a-zA-Z0-9.-] highlighted.

This: "(?<="""[a-zA-Z0-9.-]*\d{8}.xml(?=""")" results in the same error.

This "(?<=""""[a-zA-Z0-9.-]*\d{8}.xml(?="""")"

works but when I use Msgbox to view the pattern it appears like this:

(?<=""[a-zA-Z0-9.-]*\d{8}.xml(?="")

and therefore won't work correctly in RegEx.

Arghhhh!

Here's the code I'm using for testing:

    Sub tester()
        Dim PATH_TO_FILINGS As String
        'PATH_TO_FILINGS = "www.sec.gov/Archives/edgar/data/1084869/000110465913082760"
        PATH_TO_FILINGS = "www.sec.gov/Archives/edgar/data/1446896/000144689612000023"
        MsgBox GetInstanceDocumentPath(PATH_TO_FILINGS)
    End Sub

    Function GetInstanceDocumentPath(PATH_TO_FILINGS As String)

        'this part launches IE and goes to the correct directory
        If IEbrowser Is Nothing Then
            Set IEbrowser = CreateObject("InternetExplorer.application")
            IEbrowser.Visible = False
        End If

        IEbrowser.Navigate URL:=PATH_TO_FILINGS

        While IEbrowser.Busy Or IEbrowser.readyState <> 4: DoEvents: Wend

       'this part starts the regular expression engine and searches for the reg exp pattern (i.e. the file name)
        Dim RE As Object
        Set RE = CreateObject("vbscript.regexp")

        RE.Pattern = "(?<="[a-zA-Z0-9.-]*\d{8}.xml(?=")"   '"\w+(?=-)(-)\d{8}(.xml)"
        MsgBox RE.Pattern
        RE.IgnoreCase = True

        Dim INSTANCEDOCUMENT As Object

        Set INSTANCEDOCUMENT = RE.Execute(IEbrowser.Document.body.innerhtml)

        If INSTANCEDOCUMENT.Count = 1 Then

            GetInstanceDocumentPath = PATH_TO_FILINGS & "/" & INSTANCEDOCUMENT.Item(0)

        End If

    End Function

Any thoughts on how to approach this are appreciated.

Community
  • 1
  • 1
mchac
  • 317
  • 5
  • 20

1 Answers1

4

Try doing it like this:

Sub Test()
RealQ = Chr(34)
Pattern = "(?<=" & RealQ & ")[a-zA-Z0-9.-]*\d{8}.xml(?=" & RealQ & ")"
MsgBox Pattern
End Sub

Result:

enter image description here

Also, VBA doesn't support lookbehind but it does support lookahead. A better reference can be found here.

Community
  • 1
  • 1
NullDev
  • 13,073
  • 4
  • 44
  • 50
  • Yup. That passes the correct pattern. Thanks very much. Note that I had a typo in the code I posted. I had deleted a ) in all the adding and deleting " trying to get this to work. Should have read: (?<=")[a-zA-Z0-9.-]*\d{8}.xml(?="). A secondary problem I have is this string works in Rubular (www.rubular.com) but not in VBA. I get run-time error 5017. However if I use my previous pattern (which missed identifying a few strings I wanted) which is \w+(?=-)(-)\d{8}(.xml). It works. Thoughts? Does Excel VBA exclude lookforward and lookback commands perhaps? – mchac Dec 17 '13 at 03:15
  • @mchac: Edited the above code to include the missing `)`. Also, be sure to check the link provided regarding lookaheads. `VBScript` does not support lookbehind, and this applies to `VBA` as well. – NullDev Dec 17 '13 at 03:33
  • Ah, sorry BK. I saw your screen shot and was so pumped up I didn't read on. Will do that now. – mchac Dec 17 '13 at 03:39
  • @mchac: Just added that last line as well so no worries. If you found this answer helpful, kindly mark it as an answer. Thanks! – NullDev Dec 17 '13 at 03:41
  • Just did mark as an answer. Thanks again. I have limited experience with RegEx and I'm reading through that reference you provided but if you can suggest implementation in my case i would appreciate it. Happy to post as a new question – mchac Dec 17 '13 at 03:57
  • It appears that I don't need the lookbehind though it did give me comfort that i'm looking in the right spot. Submatches is an interesting concept. Will read on it. Thanks for suggesting that post. – mchac Dec 17 '13 at 04:31
  • @mchac: `Regex` is not my strong point, even if I use it quite often (pretty basic ones, really). Thanks for accepting the answer and you're welcome as well. ;) – NullDev Dec 17 '13 at 05:10