My regex experience is limited and I've been tinkering with a problem that I've not had yet managed to solve. I suspect it'll be relatively easy for someone else with more regex experience and so any pointers would be appreciated.
Context. I need to be able to validate a sentence, which can consist of a-z (both cases), 0-9, spaces, standard punctuation and <br />
and <p></p>.
I wrote some tests in C# as follows.
[TestCase("123345acbcbbc ab")]
[TestCase("123 abc")]
[TestCase("aBcC 123 123! abc; 'k21HdD_-{};:")]
[TestCase("123!")]
[TestCase("aBcC<br />123 123!<br />abc; 'k21HdD_-{};:")]
public void WhenValidatingASentence_ThenStandardPunctuation_IsSupported(string sut)
{
Assert.That(Regex.IsMatch(sut, @"^[a-zA-Z0-9]+[\sa-zA-Z0-9\p{P}]+?(<br\s/>)+?$"), Is.True);
}
The first four test cases work fine but the introduction of the break in to the pattern and input is causing the fifth case to fail.
Clearly I've misunderstood the use of a capture group or have spec'd it badly. Any guidance would be appreciated.
Needless to say all parts of the string can repeat, so paragraphs and breaks, plus characters, numbers and punctuation can be used many times throughout the sentence, although I expect the start has to be a-z or numerical.
Thanks Butters