Trying to Crawl Yelp Search Results page for Profile URLs

Question

I am trying to scrape the profile URLs from a Yelp search results page using Beautiful Soup. This is the code I currently have:

url="https://www.yelp.com/search?find_desc=tree+-+removal+-+&find_loc=Baltimore+MD&start=40"

response=requests.get(url)

data=response.text

soup = BeautifulSoup(data,'lxml')

for a in soup.find_all('a', href=True):
   with open(r'C:\Users\my.name\Desktop\Yelp-URLs.csv',"a") as f:
         print(a,file=f)

This gives me every href link on the page, not just profile URLs. Additionally, I am getting the full class string (a class lemon....), when I just need the business profile URL's.

Please help.

Would it be better to specify the results you want to scrap? — Humayun Ahmad Rajib, May 07 '20 at 03:03

score 0 · Accepted Answer · answered May 07 '20 at 03:47

0

You can narrow down the href limitation by using select.

for a in soup.select('a[href^="/biz/"]'):
    with open(r'/Users/my.name/Desktop/Yelp-URLs.csv',"a") as f:
        print(a.attrs['href'], file=f)

answered May 07 '20 at 03:47

r-beginners

11,235
2
4
19

Trying to Crawl Yelp Search Results page for Profile URLs

1 Answers1