0

i am trying to scrape the currency rates for a personal project, i used css selector to get the class where the values are. There's a javascript providing those values on the website and it seems i am noot too connversant with the developers console, i checked it out and i could not see anything running in real time in the networks section. This is the code i wrote, so far, it brings out a long list of dashes. surprisingly, the dashes match the source code for those parts were the rates are supposed to show.

from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.ig.com/en/forex/markets-forex")
soup = BeautifulSoup(r.content, "html.parser")
results = soup.findAll("span",attrs={"data-field": "CPT"})
for span in results:
    print(span.text)
blockhead
  • 139
  • 11

1 Answers1

0

Span-elements filling via JS, dynamic values. On start each span-element contains '-'. You need js driver for wait to fill elements and then get values from spans.

With selenium:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')

for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
    print(elm, elm.text)

chromedriver download from https://sites.google.com/a/chromium.org/chromedriver/home

Also, dryscrape + bs4, but dryscrape seems outdated. Example here

Modified:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')

time.sleep(2) # Maybe more or less, how much faster page load

for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
    if elm.text:
        print(elm, elm.text)

or

import time
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')

data = []
while not data:
    for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
        if elm.text and elm.text != '-': # Maybe check on contains digit
            data.append(elm.text)
    time.sleep(1)
print(data)
igor87z
  • 71
  • 2
  • this doesn' t bring out the value . This is what it shows - – blockhead Jul 06 '20 at 13:57
  • Need time for load values, add sleep ```python import time #... driver.get('https://www.ig.com/en/forex/markets-forex') time.sleep(3) #... ``` or check values in loop, example: ```python data = list() while not data: for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"): if elm.text != '-': data.append(elm.text) ``` – igor87z Jul 06 '20 at 14:31
  • Second better with sleep ```python import time #... data = list() while not data: for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"): if elm.text != '-': data.append(elm.text) time.sleep(1) ``` – igor87z Jul 06 '20 at 14:41
  • You can edit and add this to your first answer for proper indentation. Comments don't have. I'll try it. – blockhead Jul 06 '20 at 14:42
  • Done, from "Modified:" – igor87z Jul 06 '20 at 14:47