0

I new to Python and BeautifulSoup so am still learning, this is probably quite simple but I'm struggling to find an answer.

I'm basically trying to scrape '12' from the last line using the 'data-offset' tag. I can navigate to the last line by searching for class="solr-page-selector-page next full", but don't know how to then get to '12' from here.

'<'a class="solr-page-selector-page" data-offset="12">2</a>
'<'a class="solr-page-selector-page" data-offset="24">3</a>
'<'a class="solr-page-selector-page" data-offset="36">4</a>
'<'a class="solr-page-selector-page" data-offset="48">5</a>
'<'a class="solr-page-selector-page next full" data-offset="12">Next</a>

Any help would be greatly appreciated.

Thank you

blountdj
  • 449
  • 8
  • 22
  • Possible duplicate of [Handling "class" attribute in Beautifulsoup](http://stackoverflow.com/questions/5041008/handling-class-attribute-in-beautifulsoup) – roeland Jan 14 '16 at 02:38

1 Answers1

3

This will do the trick:

>>> soup.find(class_='solr-page-selector-page next full').get('data-offset')
'12'

Calling get() allows you to access attributes of the selected tag. You can also perform dict style lookups:

>>> soup.find(class_='solr-page-selector-page next full')['data-offset']
'12'

The two methods differ in their behaviour if the attribute does not exist for the tag. get() will return None whereas [] will raise a KeyError exception.

mhawke
  • 75,264
  • 8
  • 92
  • 125
  • It's not a must that the variable `class_` has an underscore after it right? That was just done to avoid using the reserved keyword `class`? – Clever Programmer Jan 14 '16 at 08:52
  • Yes, it must be `class_` to avoid clashing with the reserved word `class`. Alternatively you can pass a dict in `attrs`: `soup.find(attrs={'class':'solr-page-selector-page next full'})` – mhawke Jan 14 '16 at 09:14