-1

Below is the code to search for text between the two tags <title> and </title>

url = "https://www.ilsole24ore.com/rss/italia--attualita.xml"
r = requests.get(url)
testo = r.text
pattern = "<title>(.*?)</title>"
result = re.findall(pattern, testo)
for i in result:
    print(i)

And so far everything is ok.

Now I want to find all the text (and also internal tags) between the two external tags <item> and </item>, changing the search pattern to:

pattern = "<item>(.*?)</item>"

But it doesn't find any match.

Where is my mistake?

Tomerikoo
  • 12,112
  • 9
  • 27
  • 37
Nepura
  • 73
  • 7

1 Answers1

0

Your problem is that the <item> tags are usually multi-lined and so your pattern never finds the closing </item>.

You might want to add the DOTALL flag (or (?s) at the start of your pattern) to include the new-lines as part of the pattern. So you can choose between:

  1. pattern = "(?s)<item>(.*?)</item>"

  2. pattern = "<item>(.*?)</item>"
    result = re.findall(pattern, testo, re.DOTALL)
    
Tomerikoo
  • 12,112
  • 9
  • 27
  • 37