In my answer to this SO question I originally tried using regex to grab the javascript array that follows json_items =
and ends lazily at first ";"
i.e. captures everything in the javascript array [array]
variable called json_items
.
You can see the start of the match here:
And the regex is saved here. I can correctly match with json_items = (.*?);
in both python and PCRE (PHP) if I set the re.DOTALL
flag or single line flag respectively.
For example, in Python the following functions correctly:
import requests, re, json
data = {
'inputType': '1',
'stringInput': 'https://www.youtube.com/channel/UC43lrLHl4EhxaKQn2HrcJPQ',
'limit': '100',
'keyType': 'default'
}
r = requests.post('https://youtube-playlist-analyzer.appspot.com/submit', data=data)
p = re.compile(r'json_items = (.*?);', re.DOTALL)
results = json.loads(p.findall(r.text)[0])
The array is retrieved. However, I cannot seem to find the right setting (or to adjust regex appropriately) to do the same in VBA. My best attempt is shown below. It simply fails to match. How do I grab the javascript array as discussed using Regex in VBA please?
Option Explicit
Public Sub RegexMatch()
Dim s As String, ws As Worksheet, body As String, re As Object
body = "inputType=1&stringInput=https://www.youtube.com/channel/UC43lrLHl4EhxaKQn2HrcJPQ&limit=100&keyType=default"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set re = CreateObject("vbscript.regexp")
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://youtube-playlist-analyzer.appspot.com/submit", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send body
s = .responseText
End With
Dim matches As Object
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = "json_items = (.*?);"
If .Test(s) Then
Set matches = .Execute(s)
Debug.Print matches(0).SubMatches(0)
Else
Debug.Print "Failed"
End If
End With
End Sub
To preserve two answers given in comments:
FlorentB:
var json_items\s*=\s*((?:"(?:\\"|[^"])+"|[^;"]+)+)
Wiktor Stribiżew:
There is no option, use [^] or [\s\S] / [\d\D] / [\w\W] instead of a dot.