webscraping from tsetmc.com webpage using scrapy and python

Question

I want to scrape this webpage: http://www.tsetmc.com/loader.aspx?ParTree=151311&i=42354736493447489

here is my code: scrapy shell "http://www.tsetmc.com/loader.aspx?ParTree=151311&i=42354736493447489" and I want to grab this price that I showed in the following figure (the price and the related chrome inspect are shown in the figure): click to show image #1

then I wrote this code response.xpath('//*[@id="dbp]'), but the output is: [ ] . click to show image #2

I get confused a little. Because every number that I want to select from this website, I get this error.

I will be happy if anyone can help me. :)

score 0 · Accepted Answer · answered Dec 04 '20 at 15:16

0

Use selenium to extract javascript dynamically loaded data because javascript can not run in scrapy.

from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()

driver.get('http://www.tsetmc.com/loader.aspx?ParTree=151311&i=42354736493447489')
time.sleep(5) # delay 5 sec
page_source = driver.page_source

soup = BeautifulSoup(page_source,'html.parser')
# print(soup.prettify())
prices = soup.find('div', {'class': 'box6 h80'}).find('table')

for td in prices.find_all('tr')[1]:
    print(td.getText()) # all td text garbed.

driver.quit()

answered Dec 04 '20 at 15:16

Samsul Islam

2,322
2
13
20

Hi, thanks a lot for your response and for writing the code. How did you recognize that this is javascript? you use beautifulsoup and selenium. should I learn both of them in order to extract data from the website? – my shark Dec 05 '20 at 16:58
Disable JavaScript from network dev tool, You may follow https://stackoverflow.com/questions/13405383/how-to-disable-javascript-in-chrome-developer-tools to do that. You will see the page will not load properly. to extract the javascript related content you have to learn selenium. – Samsul Islam Dec 06 '20 at 01:44

webscraping from tsetmc.com webpage using scrapy and python

1 Answers1