I am trying to parse an html page that contains these values:
<a href="somesite.html?id=123">...</a>
<a href="somesite.html?id=456">...</a>
<a href="somesite.html?id=789">...</a>
<a href="anothersite.html">...</a>
How would I parse the Html String to get back an array of where it only contains the somesite.html:
["somesite.html?id=123", "somesite.html?id=456", "somesite.html?id=456"]
Edited
Using Zhiguo Wang's base answer, I can't seem to get only the somesite.html id values... The 3rd item in the array contains excess characters:
let htmlString = "<a href=\"somesite.html?id=123\">...</a>" +
"<a href=\"somesite.html?id=456\">...</a>" +
"<a href=\"somesite.html?id=789\">...</a>" +
"<a href=\"anothersite.html\">...</a>\""
let seperateComponent = "<a href=\"somesite.html?id="
let linkExp = "[\\w\\W]*\">"
Returns this value:
["123", "456", "789\\">...</a><a href=\\"anothersite.html"]
Expected Value: ["123", "456", "789"]
...hmm. Changing linkExp to the below resolves it. What does \W represent in Regex?
let linkExp = "[\\w]*\">"
..The length is wrong. Casted to NSString to grabbed the proper length.
Edited 2
It looks like if this string comes first before the somesite, then it includes Origin in the array:
<meta name=\"referrer\" content=\"origin\">