EDIT: PLEASE DO NOT DOWNVOTE WITHOUT COMMENTING ON WHY YOU ARE DOWNVOTING. I AM TRYING MY BEST TO WRITE THIS PROPERLY!
I am trying to print all of the URL links of watches on a website. I have all of them printing fine except one, even though that one has the exact same regex conditions as the others. Can someone explain why this isn't printing please? Have I messed up some syntax somewhere? The following code should be able to be pasted into a Python editor (i.e. IDLE) and run.
## Import required modules
from urllib import urlopen
from re import findall
import re
## Provide URL
dennisov_url = 'https://denissov.ru/en/'
## Open and read URL as string named 'dennisov_html'
dennisov_html = urlopen(dennisov_url).read()
## Find all of the links when each watch is clicked (those with the designated
## preceeding text 'window.open', then any character that occurs zero or more
## times, then the text '/en/'. Remove matches with the word "History" and
## any " symbols in the URL.
watch_link_urls = findall('window.open.*(/en/[^history][^"]*/)', dennisov_html)
## For every URL, convert it into a string on a new line and add the domain
for link in watch_link_urls:
link = 'https://denissov.ru' + link
## Print out the full URLs
print link
## This code should show the link https://denissov.ru/en/speedster/ yet
## it isn't showing. It has the exact preceeding text as the other links
## that are printing and is in the same div container. If you inspect the
## website then search 'en/barracuda_mechanical/ and then 'en/speedster/'
## you will see that the speedster link is only a few lines below barracuda
## mechanical and there is nothing different about the two's preceeding
## text, so speedster should be printing