493

I'm having trouble parsing HTML elements with "class" attribute using Beautifulsoup. The code looks like this

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div["class"] == "stylelistrow"):
        print div

I get an error on the same line "after" the script finishes.

File "./beautifulcoding.py", line 130, in getlanguage
  if (div["class"] == "stylelistrow"):
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in __getitem__
   return self._getAttrMap()[key]
KeyError: 'class'

How do I get rid of this error?

QHarr
  • 72,711
  • 10
  • 44
  • 81
Neo
  • 10,789
  • 17
  • 50
  • 78

18 Answers18

797

You can refine your search to only find those divs with a given class using BS3:

mydivs = soup.find_all("div", {"class": "stylelistrow"})
Jossef Harush
  • 24,765
  • 7
  • 95
  • 103
Klaus Byskov Pedersen
  • 104,458
  • 27
  • 176
  • 219
  • @Klaus- what if I want to use findAll instead? –  Apr 21 '11 at 15:26
  • 2
    Thanks for this. It is not just for @class but for anything. – prageeth Mar 11 '14 at 18:39
  • 53
    This only works for exact matches. `<.. class="stylelistrow">` matches but not `<.. class="stylelistrow button">`. – Wernight Jul 28 '14 at 16:07
  • @Wernight were you able to find a solution? – pyCthon Oct 06 '14 at 00:37
  • 7
    @pyCthon See answer for @jmunsch, BS now supports `class_` which works properly. – Wernight Oct 06 '14 at 09:47
  • @pyCthon see my answer below for how to solve multiple class names with BS3 – FlipMcF Dec 16 '14 at 20:58
  • 1
    _class Will only match the exact class string if you have multiple classes. In that case you can use: `soup.select("p.stylelistrow.another")` which would match `

    ` for example

    – smoothware Mar 23 '19 at 13:56
  • 1
    @Wernight Currently using BS4 (4.7.1) and `soup.find_all("div", {"class": "stylelistrow"})` works for both exact `<.. class="stylelistrow">` and contains `<.. class="stylelistrow button">` matches. – theGirrafish Jun 25 '19 at 14:00
351

From the documentation:

As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_:

soup.find_all("a", class_="sister")

Which in this case would be:

soup.find_all("div", class_="stylelistrow")

It would also work for:

soup.find_all("div", class_="stylelistrowone stylelistrowtwo")
jmunsch
  • 16,405
  • 6
  • 74
  • 87
  • 9
    You can use lists too: `soup.find_all("a", ["stylelistrowone", "stylelistrow"])` It's safer if you don't have many classes. – Nuno André Jul 07 '15 at 14:06
  • 5
    This should be the accepted answer, it's both more correct and concise than the alternatives. – loopbackbee Mar 13 '16 at 00:19
  • 3
    Supplement to @NunoAndré's answer for BeautifulSoup 3: `soup.findAll("a", {'class':['stylelistrowone', 'stylelistrow']})`. – Brad Apr 17 '18 at 15:39
  • So in bs4, class is replaced by class_? Why would i need SoupStrainer then if I could refine the search easy? – Timo Jan 31 '21 at 10:31
  • When would I use the 'dic way' and not the `=`, see @Klaus answer? – Timo Jan 31 '21 at 11:11
  • 1
    @Timo my guess would be that you can use the dict way when you're searching for an attribute other than class, so maybe something like `{'data-item': ['1']}` ex. `` – jmunsch Jan 31 '21 at 16:33
  • To extend your example , it would be then ('a',{'data-item': ['1']}). I will try this with the = (your answer), maybe will also work. – Timo Feb 01 '21 at 20:16
  • 1
    One thing to note should be, when you give `class_="class_1 class2"` it matches "exact string", so even `"class_2 class_1"` won't match. To search with multiple classes (all required), you should use selectors `soup.select('div.class_1.class_2')` this matches both `"class_1 class_2"` and `"class_2 class_1"`. – Hrishikesh Mar 06 '21 at 17:46
74

Update: 2016 In the latest version of beautifulsoup, the method 'findAll' has been renamed to 'find_all'. Link to official documentation

List of method names changed

Hence the answer will be

soup.find_all("html_element", class_="your_class_name")
Pang
  • 8,605
  • 144
  • 77
  • 113
overlord
  • 899
  • 12
  • 19
23

Specific to BeautifulSoup 3:

soup.findAll('div',
             {'class': lambda x: x 
                       and 'stylelistrow' in x.split()
             }
            )

Will find all of these:

<div class="stylelistrow">
<div class="stylelistrow button">
<div class="button stylelistrow">
FlipMcF
  • 11,666
  • 2
  • 31
  • 43
21

CSS selectors

single class first match

soup.select_one('.stylelistrow')

list of matches

soup.select('.stylelistrow')

compound class (i.e. AND another class)

soup.select_one('.stylelistrow.otherclassname')
soup.select('.stylelistrow.otherclassname')

Spaces in compound class names e.g. class = stylelistrow otherclassname are replaced with ".". You can continue to add classes.

list of classes (OR - match whichever present)

soup.select_one('.stylelistrow, .otherclassname')
soup.select('.stylelistrow, .otherclassname')

bs4 4.7.1 +

Specific class whose innerText contains a string

soup.select_one('.stylelistrow:contains("some string")')
soup.select('.stylelistrow:contains("some string")')

N.B.

soupsieve 2.1.0 + Dec'2020 onwards

NEW: In order to avoid conflicts with future CSS specification changes, non-standard pseudo classes will now start with the :-soup- prefix. As a consequence, :contains() will now be known as :-soup-contains(), though for a time the deprecated form of :contains() will still be allowed with a warning that users should migrate over to :-soup-contains().

NEW: Added new non-standard pseudo class :-soup-contains-own() which operates similar to :-soup-contains() except that it only looks at text nodes directly associated with the currently scoped element and not its descendants.

Specific class which has a certain child element e.g. a tag

soup.select_one('.stylelistrow:has(a)')
soup.select('.stylelistrow:has(a)')
QHarr
  • 72,711
  • 10
  • 44
  • 81
16

A straight forward way would be :

soup = BeautifulSoup(sdata)
for each_div in soup.findAll('div',{'class':'stylelist'}):
    print each_div

Make sure you take of the casing of findAll, its not findall

Konark Modi
  • 689
  • 6
  • 8
  • 4
    This only works for exact matches. `<.. class="stylelistrow">` matches but not `<.. class="stylelistrow button">`. – Wernight Jul 28 '14 at 16:07
14

How to find elements by class

I'm having trouble parsing html elements with "class" attribute using Beautifulsoup.

You can easily find by one class, but if you want to find by the intersection of two classes, it's a little more difficult,

From the documentation (emphasis added):

If you want to search for tags that match two or more CSS classes, you should use a CSS selector:

css_soup.select("p.strikeout.body")
# [<p class="body strikeout"></p>]

To be clear, this selects only the p tags that are both strikeout and body class.

To find for the intersection of any in a set of classes (not the intersection, but the union), you can give a list to the class_ keyword argument (as of 4.1.2):

soup = BeautifulSoup(sdata)
class_list = ["stylelistrow"] # can add any other classes to this list.
# will find any divs with any names in class_list:
mydivs = soup.find_all('div', class_=class_list) 

Also note that findAll has been renamed from the camelCase to the more Pythonic find_all.

Community
  • 1
  • 1
Aaron Hall
  • 291,450
  • 75
  • 369
  • 312
10

As of BeautifulSoup 4+ ,

If you have a single class name , you can just pass the class name as parameter like :

mydivs = soup.find_all('div', 'class_name')

Or if you have more than one class names , just pass the list of class names as parameter like :

mydivs = soup.find_all('div', ['class1', 'class2'])
Shivam Shah
  • 101
  • 1
  • 4
4

Try to check if the div has a class attribute first, like this:

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs:
    if "class" in div:
        if (div["class"]=="stylelistrow"):
            print div
Mew
  • 1,037
  • 7
  • 17
  • 1
    That doesn't work. I guess your approach was right, but 4th line doesn't work as intended. – Neo Feb 18 '11 at 12:08
  • 1
    Ah I thought div worked like a dictionary, I'm not really familiar with Beautiful Soup so it was just a guess. – Mew Feb 18 '11 at 12:11
3

This works for me to access the class attribute (on beautifulsoup 4, contrary to what the documentation says). The KeyError comes a list being returned not a dictionary.

for hit in soup.findAll(name='span'):
    print hit.contents[1]['class']
Stgltz
  • 31
  • 1
3

the following worked for me

a_tag = soup.find_all("div",class_='full tabpublist')
ysf
  • 3,793
  • 3
  • 19
  • 24
2

Other answers did not work for me.

In other answers the findAll is being used on the soup object itself, but I needed a way to do a find by class name on objects inside a specific element extracted from the object I obtained after doing findAll.

If you are trying to do a search inside nested HTML elements to get objects by class name, try below -

# parse html
page_soup = soup(web_page.read(), "html.parser")

# filter out items matching class name
all_songs = page_soup.findAll("li", "song_item")

# traverse through all_songs
for song in all_songs:

    # get text out of span element matching class 'song_name'
    # doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
    song.find("span", "song_name").text

Points to note:

  1. I'm not explicitly defining the search to be on 'class' attribute findAll("li", {"class": "song_item"}), since it's the only attribute I'm searching on and it will by default search for class attribute if you don't exclusively tell which attribute you want to find on.

  2. When you do a findAll or find, the resulting object is of class bs4.element.ResultSet which is a subclass of list. You can utilize all methods of ResultSet, inside any number of nested elements (as long as they are of type ResultSet) to do a find or find all.

  3. My BS4 version - 4.9.1, Python version - 3.8.1

ZeroFlex
  • 64
  • 9
2

Concerning @Wernight's comment on the top answer about partial matching...

You can partially match:

  • <div class="stylelistrow"> and
  • <div class="stylelistrow button">

with gazpacho:

from gazpacho import Soup

my_divs = soup.find("div", {"class": "stylelistrow"}, partial=True)

Both will be captured and returned as a list of Soup objects.

emehex
  • 7,082
  • 8
  • 46
  • 85
1

This worked for me:

for div in mydivs:
    try:
        clazz = div["class"]
    except KeyError:
        clazz = ""
    if (clazz == "stylelistrow"):
        print div
1

This should work:

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div.find(class_ == "stylelistrow"):
        print div
Blue Sky
  • 356
  • 3
  • 4
1

Alternatively we can use lxml, it support xpath and very fast!

from lxml import html, etree 

attr = html.fromstring(html_text)#passing the raw html
handles = attr.xpath('//div[@class="stylelistrow"]')#xpath exresssion to find that specific class

for each in handles:
    print(etree.tostring(each))#printing the html as string
Sohan Das
  • 1,231
  • 2
  • 9
  • 15
1

Use class_= If you want to find element(s) without stating the HTML tag.

For single element:

soup.find(class_='my-class-name')

For multiple elements:

soup.find_all(class_='my-class-name')
Jossef Harush
  • 24,765
  • 7
  • 95
  • 103
0

The following should work

soup.find('span', attrs={'class':'totalcount'})

replace 'totalcount' with your class name and 'span' with tag you are looking for. Also, if your class contains multiple names with space, just choose one and use.

P.S. This finds the first element with given criteria. If you want to find all elements then replace 'find' with 'find_all'.