0

Sorry. I have asked a question like this. After that i still have problem about data not in tag. A few different the question i asked (How can i crawl web data that not in tags)

<div class="bbs" id="main-content">
    <div class="metaline">
        <span class="article-meta-tag">
             author
        </span>
        <span class="article-meta-value">
             Jorden 
        </span>
    </div>
    <div class="metaline">
        <span class="article-meta-tag">
            board
        </span>
        <span class="article-meta-value">
            NBA
        </span>
    </div>

I am here

</div>

I only need

I am here

kovac
  • 147
  • 1
  • 10
  • `I am here` is still in a `div` tag (main-content), it's just not in CERTAIN div tags (class=metaline). Knowing that, this question might help you: https://stackoverflow.com/questions/5041008/how-to-find-elements-by-class?rq=1 – Bing Jun 04 '17 at 22:06

2 Answers2

1

The string is a child of the main div of type NavigableString, so you can loop through div.children and filter based on the type of the node:

from bs4 import BeautifulSoup, NavigableString
[x.strip() for x in soup.find("div", {'id': 'main-content'}).children if isinstance(x, NavigableString) and x.strip()]
# [u'I am here']

Data:

soup = BeautifulSoup("""<div class="bbs" id="main-content">
    <div class="metaline">
        <span class="article-meta-tag">
             author
        </span>
        <span class="article-meta-value">
             Jorden 
        </span>
    </div>
    <div class="metaline">
        <span class="article-meta-tag">
            board
        </span>
        <span class="article-meta-value">
            NBA
        </span>
    </div>
I am here
</div>""", "html.parser")
Psidom
  • 171,477
  • 20
  • 249
  • 286
0
soup = BeautifulSoup(that_html)
div_tag = soup.div
required_string = div_tag.string

go thought this documentation

Rajesh
  • 176
  • 1
  • 13