-3

I am testing a code for internet crawling.

def getExternalLinks(bs, excludeUrl):
   externalLinks = []
   #Finds all links that start with "http" that do
   #not contain the current URL
   for link in bs.find_all('a',
      href=re.compile('^(http|www)((?!'+excludeUrl+').)*$')):
      if link.attrs['href'] is not None:
         if link.attrs['href'] not in externalLinks:
            externalLinks.append(link.attrs['href'])
   return externalLinks

I cannot analysis the regular expression ((?!'+excludeUrl+').) in re.compile('^(http|www)((?!'+excludeUrl+').)*$'))

1 Answers1

-1

Check the docs:

(?!...)
Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

Adam Smith
  • 45,072
  • 8
  • 62
  • 94
  • in the ((?!'+excludeUrl+').) reg. exp. 'excludeUrl' is variable from argument of function. I think the expression is intended to except the variable URL. Can '+variable+' expression is available? – CheolYoung Sep 27 '20 at 01:24
  • @CheolYoung yes it is. That's normal string concatenation. `'a string here ' + 'another string' == 'a string here another string'` – Adam Smith Sep 27 '20 at 01:27