0

In Excel, I want to isolate Cucumber Scenarios from a (Java-written) Feature File. I'm looking for an array, with each element being one scenario.

I came up with more than a few regular expressions that do what I want within Rubular and Regex101 with the (apparently) exact same text, but when I do a ".Execute" from my Excel macro, the entire file is returned.

This is a sample of the text:

@optionalFeatureLevel
Tag Feature: some feature

Scenario: scenario1
Given something
When something
Then something

@optionalTag
Scenario: scenario2
Given something else
When something else
Then something else

Scenario Outline: scenario3
Given yet another something
When yet another something
Then yet another something

Feature: some featureThis is the function I wrote, I have a "Set arrayOrResults = returnAllStringsMatchingRegEx" receiving it on the other end:

Function returnAllStringsMatchingRegEx(sourceString As String, pattern As String) As Variant

    Dim regEx As New RegExp

    With regEx
        .Global = True
        .MultiLine = True
        .ignoreCase = True
        .pattern = pattern
    End With

    If regEx.Test(sourceString) Then
        Set returnAllStringsMatchingRegEx = regEx.Execute(sourceString)
    Else
        Set returnAllStringsMatchingRegEx = Nothing
    End If

End Function

This how I call it:

    Set arrayOfScenarios = returnAllStringsMatchingRegEx(fetchFileContent(oFile.Path), _ 
        "Scenario( Outline)?:((.+)(\n|\r|\r\n|$))+")

For the input above, I would expect "arrayOfResults" to have 3 elements:

First element:

Scenario: scenario1
Given something
When something
Then something

Second element:

Scenario: scenario2
Given another something
When another something
Then another something

Third element:

Scenario Outline: scenario3 Given yet another something When yet another something Then yet another something

The actual result is a single element containing:

Scenario: scenario1
Given something
When something
Then something

@optionalTag
Scenario: scenario2
Given another something
When another something
Then another something

Scenario Outline: scenario3
Given yet another something
When yet another something
Then yet another something

Pᴇʜ
  • 45,553
  • 9
  • 41
  • 62
  • I think your issue is "With regEx .Global = True" This tells regex to use greedy matching rather than lazy. That means The 2nd and 3rd Scenario are accepted as part of the match. I can't quite work it out but I'd add something that stops the match if $Scenario: is seen again. I'd recommend reading https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075 – Martin Oct 14 '19 at 13:46
  • This function works for other uses. I'll experiment with .Global to see what happens, and if it works, I'll make it an optional parameter in that Function's signature (defaulting to true so that it doesn't break other existing uses). Thanks for the quick reply. – Newton Olivieri Oct 14 '19 at 15:06
  • A simple switch from ".Global" True to False did not fix it. My suspicion is around the link breaks. – Newton Olivieri Oct 14 '19 at 15:29

1 Answers1

0

You just need to stop your regex when it sees the next "Scenario" line.

^Scenario( Outline)?: ((.+)(\n|\r|\r\n|$))+(?!^Scenario)

I went to https://regex101.com/ and pasted your text and regex and it showed that your regex matched the entire text. Had to remind myself of negative lookahead. Then used the approach, match everything except the tag that marks the next entry.

Martin
  • 1,468
  • 19
  • 27
  • I'm wondering if I'm missing something. With the sample code, and the regex I posted, on Regex101, it captures the scenarios only instead of the whole thing (please note it doesn't capture the first 3 lines, nor the line with "@optionalTag"). If we add text after the 3rd scenario (not starting with "Scenario:"), it doesn't get captured either. – Newton Olivieri Oct 14 '19 at 15:48
  • This works like the one I posted, which is: it seems to do the job as expected in Rubular/Regex101, but in Excel, it behaves differently (returns the entire thing starting with that first "Scenario:"). – Newton Olivieri Oct 14 '19 at 15:57
  • Check if excel can handle negative lookahead. If not then this is far harder. That regex works on regex101. – Martin Oct 14 '19 at 16:00
  • Apparently Excel does support lookahead. Yes, that regex does work on Regex101 (as also did the one I posted), but something about Excel is different. I suspect it is around the (new line|carriage return|line feed|etc.). If/when I want something that does not include multiple lines, things go well. I'll share the eventual answer once I find it. I appreciate your looking into it. Thank you. – Newton Olivieri Oct 14 '19 at 16:49
  • I found "http://regexstorm.net/tester", which appears to behave closer to how Excel does. And lets us share a permalink (but it's too long to past here directly) The regex I started with, and the one you shared, behave the same way there as it does in Excel. I'm planning a different strategy: breaking everything down line by line. It's the hard way for sure, but I've been stuck on this going on 3 days now. Once again, thanks for looking into it. – Newton Olivieri Oct 14 '19 at 20:46