I'm messing around with web scraping using requests and beautifulsoup and I'm getting some odd results when trying to loop through multiple pages of message board data by adding 1 page number each loop.
The below code is an example where I'm looping through page 1 on the message board and then page 2. Just to check myself, I'm printing the URL I'm hitting and then the first record found on that page. The URLs look to be correct but the first post is the same for both. But if I copy and paste those two URLs, I definitely see a different set of content on the page.
Can anyone tell me if this is a problem with my code or if it has something to do with how the data is structured on that forum that is giving me these results? Thanks in advance!
from bs4 import BeautifulSoup
import requests
n_pages = 2
base_link = 'http://tigerboard.com/boards/list.php?board=4&page='
for i in range (1,n_pages+1):
link = base_link+str(i)
html_doc = requests.get(link)
soup = BeautifulSoup(html_doc.text,"lxml")
bs_tags = soup.find_all("div",{"class":"msgline"})
posts=[]
for post in bs_tags:
posts.append(post.text)
print link
print posts[0]
> http://tigerboard.com/boards/list.php?board=4&page=1
> 52% of all websites are in English, but - catbirdseat MU - 3/23/17 14:41:06
> http://tigerboard.com/boards/list.php?board=4&page=2
> 52% of all websites are in English, but - catbirdseat MU - 3/23/17 14:41:06