0

Link to the page I am trying to scrape

I am exploring and understanding beautiful-soup in python, so i decided to try to web scrape Eventbrite's event data. I am wondering why my scraper function is not taking in any event listed in the page.The data frame is empty for some reason. Is it because I am calling the wrong class? I know that the site has an API, but i want to try to web scrape first before using API.

Here is my code so far

import requests
from bs4 import BeautifulSoup
import pandas as pd
event = []
location = []
price = []
date = []

eventbrite_url = "https://www.eventbrite.com/d/ca--san-diego/art-events/"  
try:
    page = requests.get(kpbs_url)

    soup = BeautifulSoup(page.text, 'html.parser')

    items = soup.find_all("li", {"class": "item"})
    for item in items:
        event.append(item.find('div', {"class": "eds-is-hidden-accessible"}).text.strip())
        location.append(item.find('div', {"class": "card-text--truncated__one"}).text.strip())
        date.append(item.find('div', {"class":"eds-text-color--primary-brand eds-l-pad-bot-1 eds-text-weight--heavy eds-text-bs"}).text.strip())
        try:
            price.append(item.find('div', {"class": "eds-media-card-content__sub eds-text-bm eds-text-color--grey-600 eds-1-mar-top-1 eds-media-card-content__sub--cropped"}).text.strip())
        except:
            price.append('Free')

    final_df = pd.DataFrame(
    {'Event': event,
     'Location': location,
     'Price': price,
     'Date':date
    })
except Exception as e:
    print(e)
    print("continuing....")
clumbzy1
  • 1
  • 1
  • 2
    The page is dynamically populated using JavaScript, so the HTML you're trying to get isn't there. If you want to scrape this page, you will need to use a tool like Selenium. The API would of course be a more lightweight way to go. – dspencer Apr 14 '20 at 10:09
  • Oh, I see. Thank you very much! I'll look into Selenium – clumbzy1 Apr 14 '20 at 10:17
  • Does this answer your question? [Scrape Dynamic contents created by Javascript using Python](https://stackoverflow.com/questions/49939123/scrape-dynamic-contents-created-by-javascript-using-python). Also see [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – dspencer Apr 14 '20 at 10:24
  • Yes very much! Ill try to work it out right away! Thank you – clumbzy1 Apr 14 '20 at 11:03
  • @clumbzy1 you don't need selenium at all. you can extract directly from the [API](https://www.eventbrite.com/api/v3/destination/events/?event_ids=88546794847,94220962435,100250291320,97851820429,96969204501,100256718544,94539814129,96769846215,90666314387,86592994979,101918773796,90664163955,97247741613,91084615537,76464879513,90846298725,99503646084,98859400127,92161101335,89349220925&expand=series,event_sales_status,primary_venue,image,saves,my_collections,ticket_availability&page_size=20) or from the same link in your question. let me know if you want me to provide an answer. – αԋɱҽԃ αмєяιcαη Apr 14 '20 at 11:05
  • @clumbzy1 here's the [code](https://bpaste.net/P45A) and the [output](http://www.sharecsv.com/s/2994355b027d41559b64e77672431cd8/data.csv) – αԋɱҽԃ αмєяιcαη Apr 14 '20 at 11:14
  • @αԋɱҽԃαмєяιcαη Interesting, Thank you. I seem to understand most of the code except this part `type="application/ld+json` – clumbzy1 Apr 15 '20 at 10:41
  • @clumbzy1 this is an identifier to locate the tag itself. – αԋɱҽԃ αмєяιcαη Apr 15 '20 at 10:42

0 Answers0