Find div class by substring then extract entire class name

Question

I'm trying to find all div's that contain the substring 'auction-results' then extract the class name. Here's an example:

<div class="auction-results high-bid has-price"></div>

I can find all the div's that contain 'auction-results' like so:

results = soup.select("div[class*=auction-results]")
type(results)
results

Out: [<div class="auction-results high-bid has-price">
     <i class="icon"></i>
     <span class="lot-price">       $700,000</span>
     </div>]

Out: bs4.element.ResultSet

What I want is to store the entire class name 'auction-results high-bid has-price' in a pandas column like so:

class_text = ['auction-results high-bid has-price']
'auction-results high-bid has-price'
scraped_data = pd.DataFrame({'class_text': class_text})
scraped_data

                            class_text
0   auction-results high-bid has-price

I haven't found a solution yet so I hope someone can help me out, thanks!

You can handle results as a string. substring/ slice should work.https://stackoverflow.com/questions/663171/how-do-i-get-a-substring-of-a-string-in-python — Sureshmani, Apr 10 '20 at 12:28
Please edit your question to show what your dataframe would look like; it's not clear from the question. — Jack Fleeting, Apr 10 '20 at 12:30
@Jack Fleeting I've added an example of what my desired output would look like, hope it helps. — avgjoe13, Apr 10 '20 at 13:00
@Sureshmani thanks for the tip, but what I need is to extract the class 'name' for many different pages which all start with 'auction-results' but have a different ending. The example above shows 'high-bid has-price' but other pages may show 'sold has-price'. That's why I need to search by substring 'auction-results' but extract entire class name. — avgjoe13, Apr 10 '20 at 13:04

score 1 · Accepted Answer · answered Apr 10 '20 at 14:51

1

Try it this way:

columns = ['class_text']
rows = []
for result in results:
    rows.append(' '.join(result['class']))
scraped_data = pd.DataFrame([rows],columns=columns)
scraped_data

Output:

    class_text
0   auction-results high-bid has-price

answered Apr 10 '20 at 14:51

Jack Fleeting

16,520
5
16
39

score 1 · Answer 2 · answered Apr 10 '20 at 14:56

1

See this example below. you can treat it as html document and using lxml to parse the full name value.

from lxml import html


results = '<div class="auction-results high-bid has-price"><i class="icon"></i><span class="lot-price">$700,000</span></div>'
tree = html.fromstring(results)
name = tree.xpath("//div[contains(@class,'auction-results')]/@class")

print(name)

It prints the full class name

['auction-results high-bid has-price']

answered Apr 10 '20 at 14:56

Sureshmani

1,849
2
6
17

Thanks for your answer, I chose the answer below cause it works a little better with the rest of my code. – avgjoe13 Apr 11 '20 at 12:42

Find div class by substring then extract entire class name

2 Answers2