Regular expression to check if HTML document contains at least one of some specific tags

Question

I want to use regex to check if an HTML document contains at least one of all the tags below:

<b> [content1] </b>

and

<i> [content2] </i>

and

<b> [content3] <i> [content4] </i> [content5] </b>

or

<i> [content6] <b> [content7] </b> [content8] </i>

'content' can be anything. Is there a way to achieve this? Thank you.

You [should not use regex to parse xml or html](https://stackoverflow.com/a/1732454/10009545). Please use a XML library like `lxml` instead. — konstantin, Nov 15 '19 at 09:02
If you just want to check if the document contains either a or element (with appropriate closing tags) I believe this should work: `var pattern = /(.+)|(.+)/` That said, bewa̙᷀re the᷇ p͊o̙ͣ́n̛̫͔͠y͇̞̜ — Kei, Nov 15 '19 at 09:05
Looks like you are looking to create a regex, but do not know where to get started. Please check [Reference - What does this regex mean](https://stackoverflow.com/questions/22937618) resource, it has plenty of hints. Also, refer to [Learning Regular Expressions](https://stackoverflow.com/a/2759417/3832970) post for some basic regex info. Once you get some expression ready and still have issues with the solution, please edit the question with the latest details and we'll be glad to help you fix the problem. — Wiktor Stribiżew, Nov 15 '19 at 10:59

score 2 · Answer 1 · answered Nov 15 '19 at 09:40

2

Answer:

<([bi])>.*<\/\1>

Test link:

https://regex101.com/r/sRNkNE/1

Explanation:

<([bi])> Will match  or  and capture the letter b or i

.* is the content of the tag. It might contain other ...or ... tags, we don't really care.

<\/\1> will match the closing tag of the previously captured letter (b or i)

If you don't want to match when the content is empty, you can replace .* with .*\S.*

answered Nov 15 '19 at 09:40

Vincent

The tags can span several lines, use [\s\S]* instead of .* – Poul Bak Nov 15 '19 at 09:51
@PoulBak you're right – Vincent Nov 15 '19 at 10:00

1 Answers1