-1

I want to use regex to check if an HTML document contains at least one of all the tags below:

<b> [content1] </b>

and

<i> [content2] </i>

and

<b> [content3] <i> [content4] </i> [content5] </b>

or

<i> [content6] <b> [content7] </b> [content8] </i>

'content' can be anything. Is there a way to achieve this? Thank you.

Louis Tran
  • 1,078
  • 16
  • 37
  • 2
    You [should not use regex to parse xml or html](https://stackoverflow.com/a/1732454/10009545). Please use a XML library like `lxml` instead. – konstantin Nov 15 '19 at 09:02
  • 1
    If you just want to check if the document contains either a or element (with appropriate closing tags) I believe this should work: `var pattern = /(.+)|(.+)/` That said, bewa̙᷀re the᷇ p͊o̙ͣ́n̛̫͔͠y͇̞̜ – Kei Nov 15 '19 at 09:05
  • Looks like you are looking to create a regex, but do not know where to get started. Please check [Reference - What does this regex mean](https://stackoverflow.com/questions/22937618) resource, it has plenty of hints. Also, refer to [Learning Regular Expressions](https://stackoverflow.com/a/2759417/3832970) post for some basic regex info. Once you get some expression ready and still have issues with the solution, please edit the question with the latest details and we'll be glad to help you fix the problem. – Wiktor Stribiżew Nov 15 '19 at 10:59

1 Answers1

2

Answer:

<([bi])>.*<\/\1>

Test link:

https://regex101.com/r/sRNkNE/1

Explanation:

<([bi])> Will match <b> or <i> and capture the letter b or i

.* is the content of the tag. It might contain other <b>...</b>or <i>...</i> tags, we don't really care.

<\/\1> will match the closing tag of the previously captured letter (b or i)

If you don't want to match when the content is empty, you can replace .* with .*\S.*

Vincent
  • 1,762
  • 8
  • 15