This is my input string
<div>http://google.com</div><span data-user-info="{\"name\":\"subash\", \"url\" : \"http://userinfo.com?userid=33\"}"></span><a href="https://contact.me"></a>http://byebye.com is a dummy website.
for this case I need to match only first and last occurrence of http. because those are innerText in html point of view. http in attribute values we need to ignore. I build following regex.
(?<!href=\"|src=\"|value=\"|href=\'|src=\'|value=\'|=)(http://|https://|ftp://|sftp://)
It is working fine for first and last occurrence. but this is also matching the second occurrence of http. the link(http) in the attribute we don't need to match.
FYI : I am trying negative lookahead, but that is seems not helping. This is the one with negative lookahead.
(?<!href=\"|src=\"|value=\"|href=\'|src=\'|value=\'|=)(http://|https://|ftp://|sftp://).*?(?!>)