1

I wrote a code to scrape data from a website but it only executes except part. Actually I want to scrape application and part no. But the main problem is when I execute this program it only displays empty application and empty part no. Could you please guide me to fix this issue, thanks!
here is my code:

import requests
from bs4 import BeautifulSoup
import csv

def get_page(url):
    response = requests.get(url)
    if not response.ok:
        print('server responded:', response.status_code)
    else:
        soup = BeautifulSoup(response.text, 'html.parser') # 1. html , 2. parser
    return soup
def get_detail_page(soup):
     try:
        application = soup.find('td',class_="application",id=False).text
     except:
        application = 'Empty Title'
     print(application)
     try:
        part_no = soup.find('td',class_="application",id=False)[0].text
     except:
        part_no = 'Empty Title'
     print(part_no) 

def main():
    url = "https://www.automotivebulbfinder.com/philips/"
    #get_page(url)
    get_detail_page(get_page(url))

if __name__ == '__main__':
    main()
Nazim Kerimbekov
  • 3,965
  • 6
  • 23
  • 48
M.Akram
  • 55
  • 7
  • 2
    What exactly is the problem? – Tylerr Mar 18 '20 at 09:10
  • 2
    Can you please specify exactly what the error is, with the full traceback error if available. Just "Only part works" helps no one in trying to figure out what's wrong. Please edit the post with this information – KJTHoward Mar 18 '20 at 09:14
  • 2
    For a start, you should [avoid using bare `except`](https://stackoverflow.com/questions/4990718/about-catching-any-exception). If you really want help with this, you need to show the HTML being parsed, explain what you're trying to extract and show that you have considered why it's not working already. Please read [how to ask](https://stackoverflow.com/help/how-to-ask) to learn how to write a good question likely to elicit help. – dspencer Mar 18 '20 at 09:28
  • If you visit the link provided in the code there are applications and part no that I want to scrape. – M.Akram Mar 18 '20 at 09:31
  • 3
    If you print out soup from `soup = BeautifulSoup(response.text, 'html.parser')`, you'll see it doesn't have a td tag of class application. Also evident if you view page source in chrome browser. – DarrylG Mar 18 '20 at 09:32
  • 1
    @M.Akram Please understand that we are helping you voluntarily. Please make it as easy as possible by including all relevant information in your question. – dspencer Mar 18 '20 at 09:33
  • @DarrylG sorry sir there are some more action needs to be done to see the actuall result. let me clear you when you open the url there are 3 search filter will appear you just need to fill any info in these filters like "YEAR, MAKE, MODEL" then the actual results will appear. – M.Akram Mar 18 '20 at 09:42
  • 1
    Well you don't do that as part of your URL in the above code... – dspencer Mar 18 '20 at 09:44
  • is there any other solution to scrape these informations? – M.Akram Mar 18 '20 at 09:46
  • 2
    Looks like the page is dynamically generated, so you'll need to look into another tool, e.g. selenium: https://towardsdatascience.com/web-scraping-using-selenium-and-beautifulsoup-99195cd70a58 – dspencer Mar 18 '20 at 09:50
  • 1
    You can also take a look at [Python Web Scraping - Dynamic Websites](https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_dynamic_websites.htm) which is a tutorial on using Python with Selenium. – DarrylG Mar 18 '20 at 09:58
  • could you please refer me some online resources from where can I learn web scraping at advance level? – M.Akram Mar 18 '20 at 10:02

1 Answers1

1

From the look of it you don't even need to use scraping (beautifulsoup) since it just uses a JSON API from the endpoint https://www.automotivebulbfinder.com/philips/functions/ajax.php. Depending on the URL parameters you give to that URL it will return the list of years, model and submodel which is distinguished by key parameter

For instance :

etc...

The code below get each of these list with corresponding year, model and submodel based on user input :

import requests
import json

url = "https://www.automotivebulbfinder.com/philips/functions/ajax.php"

def get_items(itemName, params):
    r = requests.get(url, params = params)

    data = json.loads(r.text)

    if (data.get("data")): return (None, data["data"])

    items = json.loads(r.text)["items"].keys()
    for val in items: print(val)

    return (input("Enter a " + itemName + " : "), None)

year, data = get_items("year", { "key": "yearSelect"})
make, data = get_items("make", { "key": "makeSelect", "year": year})
model, data = get_items("model", { "key": "modelSelect", "year": year, "make": make})

if data: 
    print(data)
    exit(0)

qualifier, data = get_items("submodel", { "key": "submodelSelect", "year": year, "make": make, "model": model})

r = requests.get(url, params = {
    "key": "selectVehicle",
    "year": year,
    "make": make,
    "model": model,
    "qualifier": qualifier
})

print(json.loads(r.text)["data"])

Input Example :

[list of year]
Enter a year : 2019
[list of make]
Enter a make : Jeep
Cherokee
Enter a model : Cherokee
(w/halogen capsule headlamps)
(w/HID headlamps)
Enter a submodel : (w/HID headlamps)
{ all your data here}
Bertrand Martel
  • 32,363
  • 15
  • 95
  • 118
  • Love you bro! It works. If you don't mind can I ask for one thing? how casn I store this data into csv file? please – M.Akram Mar 19 '20 at 12:02
  • 1
    @M.Akram you can use [this](https://stackoverflow.com/a/3087011/2614364) with the a list of dict check [this gist](https://gist.github.com/bertrandmartel/d6a46d519363c9e23c88c8e2e0aa648f) – Bertrand Martel Mar 19 '20 at 12:13
  • 1
    also if you want all the data (for all years / all models) you will need to loop all these parameters – Bertrand Martel Mar 19 '20 at 12:15
  • Bro I'm new to web scraping and I'm not sure how to set loop here for all records and store into csv file. Please – M.Akram Mar 19 '20 at 12:19
  • you can begin slow hardcoding a year / make and model. Then you iterate on qualifier list you get from the code above and append all the result to the same list. Then build a function doing the latter (iterating qualifier and appending result to list). After that you will be confident in doing the same things for model, brand and so on – Bertrand Martel Mar 19 '20 at 12:25