I am working on a web scraping project for a login site. I managed to login in successfully. The site contains a dynamic table. When I run my code, it scraped the page but not the dynamic content, I tried to use selenium but it always asked me to login to Chrome instead of taking me to the page.
The following is my login code to the page:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
server = requests.Session()
login_page_url = 'https://connect.data.com/login'
loginProcess_url = 'https://connect.data.com/loginProcess'
html = server.get(login_page_url).content
soup = BeautifulSoup(html, 'html.parser')
csrf = soup.find(id="CSRF_TOKEN")['value']
login_detail = {
'j_username':'******',
'j_password':'******',
'CSRF_TOKEN': csrf,
}
server.post(loginProcess_url, data=login_detail)
r = server.get('https://connect.data.com/search#p=searchresult;;t=companies;;ss=advancedsearch;;q=H4sIAAAAAAAAAE2PzQ6CQAyE36VnDgsKGq48gMar4UCWqptAa_YHYwjv7nYJ6mUyO52v2c5wM4NH66CeQXMgbw3GxxWOSiloMzDUB_dNc1UoGWSQF7vNqWpzufpOk5UFOD4HfuPKlzLci-xECpGjyEGkSoDFCSms_V8rQeV_NZGxzy-KBzzMc_2hANAuGXTaGyZ3ooaHMFI6UbIJGyYfXUocWw819Og0LJHSwVokf-7uCHVeZuDZd8MFNds-7lrzUi0flYbWRDoBAAA')
soup = BeautifulSoup(r.text)
print (soup.find('table',{"class":"result"}))
The following is the code i added to scrape the dynamic content:
path_to_driver = '/Users/Moment/Desktop/phantomjs'
url = 'https://connect.data.com/search#p=searchresult;;t=companies;;ss=advancedsearch;;q=H4sIAAAAAAAAAE2PzQ6CQAyE36VnDgsKGq48gMar4UCWqptAa_YHYwjv7nYJ6mUyO52v2c5wM4NH66CeQXMgbw3GxxWOSiloMzDUB_dNc1UoGWSQF7vNqWpzufpOk5UFOD4HfuPKlzLci-xECpGjyEGkSoDFCSms_V8rQeV_NZGxzy-KBzzMc_2hANAuGXTaGyZ3ooaHMFI6UbIJGyYfXUocWw819Og0LJHSwVokf-7uCHVeZuDZd8MFNds-7lrzUi0flYbWRDoBAAA'
browser = webdriver.PhantomJS(executable_path = path_to_driver)
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")
print(soup.prettify())
The first section of code logs me in but each time I added the second section of code I am no longer logged in. Instead I get the login page.
I have used Chromedriver and PhantomJS.