BeautifulSoup get_text separate values with "
"

Question

HTML:

"<td class='tdtl'><a class='col' href='detail.php?id=1' target='_blank''>List 1< br>detail 1</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=2' target='_blank''>List 2< br>detail 2</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=3' target='_blank''>List 3< br>detail 3</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=4' target='_blank''>List 4< br>detail 4</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=5' target='_blank''>List 5< br>detail 5</a></td>"

Python coding:

for index in soup.select("col"):
    print(index.get_text())

RESULT:

Line 1detail 1

Line 2detail 2

Line 3detail 3

Line 4detail 4

Line 5detail 5

How to retrieve "Line 1" and "detail 1" in variables?

Post HTML code, but I can’t see the "
" tag on my question. Therefore, adding a spare is only for display. — TEQ, Feb 12 '21 at 10:14

Pythmalion · Accepted Answer · 2021-02-12T12:33:23.913

-1

If the tag is always formatted as   then you can use a simple split :

the_lists = []
the_details = []

for index in soup.select("a.col"):
    my_text = index.get_text().split('< br>')
    the_lists.append(my_text[0])
    the_details.append(my_text[1])
    
print(the_lists) # ['List 1', 'List 2', 'List 3', 'List 4', 'List 5'] 
print(the_details) # ['detail 1', 'detail 2', 'detail 3', 'detail 4', 'detail 5']

EDIT

To manage other format of   tag such as       you can use regex :

the_lists = []
the_details = []

for index in soup.select("a.col"):
    text = re.sub("<(\s*)br(\s*)>","<br>",index.get_text())
    my_text = text.split('<br>')
    the_lists.append(my_text[0])
    the_details.append(my_text[1])
    
print(the_lists)    
print(the_details)

edited Feb 12 '21 at 12:33

answered Feb 12 '21 at 09:42

Pythmalion

14
4

"
" with spaces is working, but "
" without spaces cannot be split. – TEQ Feb 12 '21 at 10:39
In your example, you did not have
but only . However you can manage it with regex. see update – Pythmalion Feb 12 '21 at 12:31

BeautifulSoup get_text separate values with ""

1 Answers1

BeautifulSoup get_text separate values with "
"