BeautifulSoup - Cant get the content of the page

Question

I'm using BeautifulSoup for a while and I've hadn't had much problems. But now I'm trying to scrape from a site that gives me some problem. My code is this:

    preSoup = requests.get('https://www.betbrain.com/football/world/')
    print(currUrl)
    soup = BeautifulSoup(preSoup.content,"lxml")
    print(soup)

the content I get seems to be some sort of script and/or api they're connected to, but not the real content of the webpage I see in the browser. I cant reach the games for example. Does anyone knows a way around it? Thank you

Possible duplicate of [scrape html generated by javascript with python](https://stackoverflow.com/questions/2148493/scrape-html-generated-by-javascript-with-python) — bobrobbob, Jun 27 '18 at 11:06

ThunderHorn · Accepted Answer · 2018-06-27T11:49:28.890

1

Okay requests gets only the html and doesnt load the js you have to use webdriver for that you can use Chrome, Firefox and etc.. i use PhantomJS because is running in the background its "headless" browser. Underneath you will find some example code that will help you understand how to use it

from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("https://www.betbrain.com/football/world/")
time.sleep(5)# you can give it some time to load the js 
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
for i in  soup.findAll("span", {"class": "Participant1"}):
    print (i.text)

edited Jun 27 '18 at 11:49

answered Jun 27 '18 at 11:05

ThunderHorn

1,653
1
9
31

That's awesome, I had a feeling it's somewhere around the webdriver but I couldn't figure it out. Why do you print the html1==html? I guess it's for checking if the page did load? – Amit Nelinger Jun 27 '18 at 11:35
An Thanks a lot! – Amit Nelinger Jun 27 '18 at 11:35
yeah sorry it was a debug :) i will remove it :) – ThunderHorn Jun 27 '18 at 11:49

BeautifulSoup - Cant get the content of the page

1 Answers1