1

I am working on a web scraping project for a login site. I managed to login in successfully. The site contains a dynamic table. When I run my code, it scraped the page but not the dynamic content, I tried to use selenium but it always asked me to login to Chrome instead of taking me to the page.

The following is my login code to the page:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time

server = requests.Session()

login_page_url = 'https://connect.data.com/login'
loginProcess_url = 'https://connect.data.com/loginProcess'

html = server.get(login_page_url).content
soup = BeautifulSoup(html, 'html.parser')
csrf = soup.find(id="CSRF_TOKEN")['value']

login_detail = {
    'j_username':'******',
    'j_password':'******',
    'CSRF_TOKEN': csrf,
}

server.post(loginProcess_url, data=login_detail)

r = server.get('https://connect.data.com/search#p=searchresult;;t=companies;;ss=advancedsearch;;q=H4sIAAAAAAAAAE2PzQ6CQAyE36VnDgsKGq48gMar4UCWqptAa_YHYwjv7nYJ6mUyO52v2c5wM4NH66CeQXMgbw3GxxWOSiloMzDUB_dNc1UoGWSQF7vNqWpzufpOk5UFOD4HfuPKlzLci-xECpGjyEGkSoDFCSms_V8rQeV_NZGxzy-KBzzMc_2hANAuGXTaGyZ3ooaHMFI6UbIJGyYfXUocWw819Og0LJHSwVokf-7uCHVeZuDZd8MFNds-7lrzUi0flYbWRDoBAAA')
soup = BeautifulSoup(r.text)
print (soup.find('table',{"class":"result"}))

The following is the code i added to scrape the dynamic content:

path_to_driver = '/Users/Moment/Desktop/phantomjs'

url = 'https://connect.data.com/search#p=searchresult;;t=companies;;ss=advancedsearch;;q=H4sIAAAAAAAAAE2PzQ6CQAyE36VnDgsKGq48gMar4UCWqptAa_YHYwjv7nYJ6mUyO52v2c5wM4NH66CeQXMgbw3GxxWOSiloMzDUB_dNc1UoGWSQF7vNqWpzufpOk5UFOD4HfuPKlzLci-xECpGjyEGkSoDFCSms_V8rQeV_NZGxzy-KBzzMc_2hANAuGXTaGyZ3ooaHMFI6UbIJGyYfXUocWw819Og0LJHSwVokf-7uCHVeZuDZd8MFNds-7lrzUi0flYbWRDoBAAA'


browser = webdriver.PhantomJS(executable_path = path_to_driver)
browser.get(url)

html = browser.page_source
soup = BeautifulSoup(html, "lxml")
print(soup.prettify())

The first section of code logs me in but each time I added the second section of code I am no longer logged in. Instead I get the login page.

I have used Chromedriver and PhantomJS.

B.Adler
  • 1,258
  • 1
  • 15
  • 21
  • on the cookie setting what will be the name, value path and expire ? '('name', 'value' , 'path','expiry')' – Segun Oyebode Jul 26 '17 at 18:01
  • 1
    You need to "import" the cookie from requests to selenium. This post shows how to import cookies from selenium to requests and the other way as well... https://stackoverflow.com/a/42114843/8240959 – jlaur Jul 26 '17 at 22:47
  • don't you think with the session object (retrieved by calling requests.Session()) keeps track of session information, such as cookies, headers – Segun Oyebode Jul 26 '17 at 23:32
  • 1
    Sure. Both selenium and requests handles cookies - but separately. They are two different packages. They do not share cookies unless you tell them to. This causes the problem you have where you login with requests but then - when you try to continue using selenium - all of a sudden aren't logged in. You actually are, but only with requests, not selenium. Either import the cookie to selenium or use selenium all the way (do the login with selenium too). – jlaur Jul 27 '17 at 07:07

0 Answers0