Code:
from bs4 import BeautifulSoup
import urllib.request
import sys
import time
import re
for num in range(680):
address = ('http://www.diabetes.org/mfa-recipes/recipes/recipes-archive.html?page=' + str(num))
html = urllib.request.urlopen(address).read()
soup = BeautifulSoup((html), "html.parser")
for link in soup.findAll('a', attrs={'href': re.compile("/recipes/20")}):
find = re.compile('/recipes/20(.*?)"')
searchRecipe = re.search(find, str(link))
recipe = searchRecipe.group(1)
urllinks = ('http://www.diabetes.org/mfa-recipes/recipes/20' + str(recipe))
urllinks = urllinks.replace(" ","")
outfile = open('C:/recipes/recipe.txt', 'a')
outfile.write(str(urllinks) + '\n')
f = open('recipe.txt', 'r')
for line in f.readlines():
id = line.strip('\n')
url = "urllinks".format(id)
html_two = urllib.request.urlopen(url).read()
soup_two = BeautifulSoup((html_two), "html.parser")
for div in soup.find_all('div', class_='ingredients'):
print(div.text)
for div in soup.find_all('div', class_='nutritional_info'):
print(div.text)
for div in soup.find_all('div', class_='instructions'):
print(div.text)
The first section (which ends with the outfile) works for sure but the second part doesn't. I know this because when I run the program it stores all the links but doesn't do anything else after that. For the second part I'm trying to open the file "recipe.txt" and going to each link and scraping certain data (ingredients, nutritional_info, and the instructions).