2

I've written a script using vba in combination with IE to parse the contact information from a webpage applying regex on it. I searched a lot but could not find any example that can satiate my requirement. The pattern may not be ideal to find the phone number but the main concern here is how I can use the pattern within vba IE.

Once again: my intention here is to parse the phone number 661-421-5861 from that webpage applying regex within vba IE.

This is what I've tried so far:

Sub FetchItems()
    Const URL$ = "https://www.nafe.com/bakersfield-nafe-network"
    Dim IE As New InternetExplorer, HTML As HTMLDocument
    Dim rxp As New RegExp, email As Object, Row&

    With IE
        .Visible = True
        .navigate URL
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Set HTML = .document
    End With

    With rxp
        .Pattern = "(?<=Phone:)\s*?.*?([^\s]+)"
        Set email = .Execute(HTML.body.innerText) 'I'm getting here an error
        If email.Count > 0 Then
            Row = Row + 1: Cells(Row, 1) = email.Item(0)
        End If
    End With
    IE.Quit
End Sub

When I execute the above script I encounter an error method "Execute" of object "IRegExp2" failed when it hits the line containing Set email = .Execute(HTML.body.innerText). How can I make it a go successfully?

SIM
  • 20,216
  • 3
  • 27
  • 78
  • Later, you will also get an error related to your regex as VBA regex does not support lookbehinds. Also, that page does not seem to contain `Contact:` string. Maybe you should log in first. – Wiktor Stribiżew Jul 12 '18 at 10:22
  • It is nice to hear from you @Wiktor Stribiżew. This is also a new information to me. The pattern may not be accurate. However, all i wish to know is how to apply it within vba IE. – SIM Jul 12 '18 at 10:24
  • Okay, I've fixed that wrongly used `Contact:` within the pattern. – SIM Jul 12 '18 at 10:26

2 Answers2

3

Note that lookbehinds are not supported by VBA regex. Here, you probably want to capture any digit followed with any amount of digits and hyphens after Phone:.

You need to re-define the pattern as

rxp.Pattern = "Phone:\s*(\d[-\d]+)"

Then, you need to grab the first match and access its .SubMatches(0):

Set email = .Execute(HTML.body.innerText)
If email.Count > 0 Then
    Cells(Row+1, 1) = email.Item(0).SubMatches(0)
 End If

See the regex in action. The green-highlighted part of sting is what .SubMatches(0) holds.

Pattern details

  • Phone: - a literal substring
  • \s* - 0+ whitespaces
  • (\d[-\d]+) - Capturing group 1: a digit, followed with 1+ (due to +, you may replace with * to match zero or more) digits or/and hyphens.
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • 1
    You made my day @Wiktor Stribiżew. Thankssss a lot. – SIM Jul 12 '18 at 10:52
  • A slight little question on this `[-\d]+` portion @Wiktor Stribiżew. What confuses me is `followed with 1+ digits or/and hyphens`. As I'm a beginner, I knew so far `[-\d]+` means `followed with 1+ digits or hyphens`. Mind the `or` only. So why it would be `and` as well? A oneliner clarification in comment will suffice. Thanks in advance for taking a look into it. – SIM Jul 12 '18 at 13:30
  • 1
    @Topto Elements of a *positive* character class are ORed. [`[\d-]+` matches](https://regex101.com/r/FdDs8q/1) 1 or more chars that are either digits or hyphens, it matches `-------------`, `-2-3-5-6-87-9`, or `1223445678988`. Elements of the *negative* character class are ANDed, [`[^\d-]+`](https://regex101.com/r/FdDs8q/2) matches 1 or more chars that are not digits *AND* not hyphens. – Wiktor Stribiżew Jul 12 '18 at 13:34
  • 1
    Nothing clearer. It's really difficult to sort things out all alone. Much obliged. – SIM Jul 12 '18 at 13:52
2

Here is a quicker way using xmlhttp object

Sub FetchItems()
   Dim URL As String, strBody As String
   Dim intS As Long, intE As Long

    URL = "https://www.nafe.com/bakersfield-nafe-network"

    Dim xml As Object
    Set xml = CreateObject("MSXML2.XMLHTTP")
    xml.Open "GET", URL, False
    xml.send

    Dim html As Object
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = xml.responseText

    strBody = html.body.innerHTML

    intS = InStr(1, strBody, "Phone:", vbTextCompare) + Len("Phone:")
    intE = InStr(intS, strBody, "<", vbTextCompare)

    MsgBox Mid(strBody, intS, intE - intS)
End Sub
Santosh
  • 11,722
  • 3
  • 36
  • 67
  • Thanks @Santosh, for your answer. This is faster but this is not what I sought solution for. Provided plus one. – SIM Jul 12 '18 at 10:54
  • @Topto I know but definitely would be useful for other readers :-) – Santosh Jul 12 '18 at 10:56