You can use an re.compile
object with soup.find_all
:
import re
from bs4 import BeautifulSoup as soup
html = """
<table>
<tr style='width:40%'>
<td style='align:top'></td>
</tr>
</table>
"""
results = soup(html, 'html.parser').find_all(re.compile('td|tr'), {'style':re.compile('width:40%|align:top')})
Output:
[<tr style="width:40%">
<td style="align:top"></td>
</tr>, <td style="align:top"></td>]
By providing the re.compile
object to specify the desired tags and style
values, find_all
will return any instances of tr
or td
tag containing an inline style
attribute of either width:40%
or align:top
.
This method can be extrapolated upon to find elements by providing multiple attribute values:
html = """
<table>
<tr style='width:40%'>
<td style='align:top' class='get_this'></td>
<td style='align:top' class='ignore_this'></td>
</tr>
</table>
"""
results = soup(html, 'html.parser').find_all(re.compile('td|tr'), {'style':re.compile('width:40%|align:top'), 'class':'get_this'})
Output:
[<td class="get_this" style="align:top"></td>]
Edit 2: Simple recursive solution:
import bs4
from bs4 import BeautifulSoup as soup
def get_tags(d, params):
if any((lambda x:b in x if a == 'class' else b == x)(d.attrs.get(a, [])) for a, b in params.get(d.name, {}).items()):
yield d
for i in filter(lambda x:x != '\n' and not isinstance(x, bs4.element.NavigableString) , d.contents):
yield from get_tags(i, params)
html = """
<table>
<tr style='align:top'>
<td style='width:40%'></td>
<td style='align:top' class='ignore_this'></td>
</tr>
</table>
"""
print(list(get_tags(soup(html, 'html.parser'), {'td':{'style':'width:40%'}, 'tr':{'style':'align:top'}})))
Output:
[<tr style="align:top">
<td style="width:40%"></td>
<td class="ignore_this" style="align:top"></td>
</tr>, <td style="width:40%"></td>]
The recursive function enables you to provide your own dictionary with desired target attributes for certain tags: this solution attempts to match any of the specified attributes to the bs4
object passed to the function, and if a match is discovered, the element is yield
ed.