-1

HTML:

"<td class='tdtl'><a class='col' href='detail.php?id=1' target='_blank''>List 1< br>detail 1</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=2' target='_blank''>List 2< br>detail 2</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=3' target='_blank''>List 3< br>detail 3</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=4' target='_blank''>List 4< br>detail 4</a></td>"
"<td class='tdtl'><a class='col' href='detail.php?id=5' target='_blank''>List 5< br>detail 5</a></td>"

Python coding:

for index in soup.select("col"):
    print(index.get_text())

RESULT:

Line 1detail 1

Line 2detail 2

Line 3detail 3

Line 4detail 4

Line 5detail 5

How to retrieve "Line 1" and "detail 1" in variables?

J. M. Arnold
  • 4,024
  • 1
  • 13
  • 30
TEQ
  • 3
  • 1

1 Answers1

-1

If the tag is always formatted as < br> then you can use a simple split :

the_lists = []
the_details = []

for index in soup.select("a.col"):
    my_text = index.get_text().split('< br>')
    the_lists.append(my_text[0])
    the_details.append(my_text[1])
    
print(the_lists) # ['List 1', 'List 2', 'List 3', 'List 4', 'List 5'] 
print(the_details) # ['detail 1', 'detail 2', 'detail 3', 'detail 4', 'detail 5']

EDIT

To manage other format of <br> tag such as < br> <br> <br > you can use regex :

the_lists = []
the_details = []

for index in soup.select("a.col"):
    text = re.sub("<(\s*)br(\s*)>","<br>",index.get_text())
    my_text = text.split('<br>')
    the_lists.append(my_text[0])
    the_details.append(my_text[1])
    
print(the_lists)    
print(the_details)
Pythmalion
  • 14
  • 4