-2

I love Regular Expressions. However, I've just now come across the inability to use the s flag when running JavaScript RegExp in the browser. I'm curious as to why is this flag not included? It would be really helpful.

I've seen there's an external library XRegExp which enables this s flag (and a few others), but I'm also curious as to why those extra (and helpful) flags don't exist in standard JavaScript too. I'm also loath to include yet another external library...

Here's an example where I'm trying to solve an issue with detecting open/closing tags for WordPress shortcodes which may have newlines within (or I have to insert newlines between to improve the detection).

//
// Let's take some input text, e.g. WordPress shortcodes
//
var exampleText = '[buttongroup class="vertical"][button content="Example 1" class="btn-default"][/button][button class="btn-primary"]Example 2[/button][/buttongroup]'

//
// Now let's say I want to extract the shortcodes and its attributes
// keeping in mind shortcodes can or cannot have closing tags too
//
// Shortcodes which have content between the open/closing tags can contain
// newlines. One of the issues with the flags is that I can't use `s` to make
// the dot character.
//
// When I run this on regex101.com they support the `s` flag (probably with the
// XRegExp library) and everything seems to work well. However when running this
// in the browser I get the "Uncaught SyntaxError: Invalid regular expression
// flags" error.
//
var reGetButtons = /\[button(?:\s+([^\]]+))?\](?:(.*)\[\/button\])?/gims
var reGetButtonGroups = /\[buttongroup(?:\s+([^\]]+))?\](?:(.*)\[\/buttongroup\])?/gims

//
// Some utility methods to extract attributes:
//

// Get an attribute's value
//
// @param string input
// @param string attrName
// @returns string
function getAttrValue (input, attrName) {
  var attrValue = new RegExp(attrName + '=\"([^\"]+)\"', 'g').exec(input)
  return (attrValue ?  window.decodeURIComponent(attrValue[1]) : '')
}

// Get all named shortcode attribute values as an object
//
// @param string input
// @param array shortcodeAttrs
// @returns object
function getAttrsFromString (input, shortcodeAttrs) {
  var output = {}
  for (var index = 0; index < shortcodeAttrs.length; index++) {
    output[shortcodeAttrs[index]] = getAttrValue(input, shortcodeAttrs[index])
  }
  return output
}

//
// Extract all the buttons and get all their attributes and values
//
function replaceButtonShortcodes (input) {
  return input
    //
    // Need this to avoid some tomfoolery.
    // By splitting into newlines I can better detect between open/closing tags,
    // however it goes out the window when newlines are within the
    // open/closing tags.
    //
    // It's possible my RegExps above need some adjustments, but I'm unsure how,
    // or maybe I just need to replace newlines with a special character that I
    // can then swap back with newlines...
    //
    .replace(/\]\[/g, ']\n[')
    // Find and replace the [button] shortcodes
    .replace(reGetButtons, function (all, attr, content) {
      console.log('Detected [button] shortcode!')
      console.log('-- Extracted shortcode components', { all: all, attr: attr, content: content })

      // Built the output button's HTML attributes
      var attrs = getAttrsFromString(attr, ['class','content'])
      console.log('-- Extracted attributes', { attrs: attrs })
      
      // Return the button's HTML
      return '<button class="btn ' + (typeof attrs.class !== 'undefined' ? attrs.class : '') + '">' + (content ? content : attrs.content) + '</button>'
    })
}

//
// Extract all the button groups like above
//
function replaceButtonGroupShortcodes (input) {
  return input
    // Same as above...
    .replace(/\]\[/g, ']\n[')
    // Find and replace the [buttongroup] shortcodes
    .replace(reGetButtonGroups, function (all, attr, content) {
      console.log('Detected [buttongroup] shortcode!')
      console.log('-- Extracted shortcode components', { all: all, attr: attr, content: content })
      
      // Built the output button's HTML attributes
      var attrs = getAttrsFromString(attr, ['class'])
      console.log('-- Extracted attributes', { attrs: attrs })
      
      // Return the button group's HTML
      return '<div class="btn-group ' + (typeof attrs.class !== 'undefined' ? attrs.class : '' ) + '">' + (typeof content !== 'undefined' ? content : '') + '</div>'
    })
}

//
// Do all the extraction on our example text and set within the document's HTML
//
var outputText = replaceButtonShortcodes(exampleText)
outputText = replaceButtonGroupShortcodes(outputText)
document.write(outputText)

Using the s flag would allow me to do it easily, however since it's unsupported, I can't utilise the benefits of the flag.

Matt Scheurich
  • 845
  • 2
  • 10
  • 23
  • 1
    A warning to anyone curious to look at `regexp101.com`: __Don't__. It looks like malware site. – Andy Sep 25 '17 at 15:41
  • 1
    https://meta.stackoverflow.com/a/293819/47589 –  Sep 25 '17 at 15:42
  • @Andy: Whereas http://regex101.com (without the p) isn't (as far as I know) and myself and others have used it for years. Matt, is that just a typo in your code comment in the question? – T.J. Crowder Sep 25 '17 at 15:44
  • 1
    Whoops, my bad -- confusion with all the RegExp flying about. I fixed the link – Matt Scheurich Sep 25 '17 at 16:04
  • I've updated my answer to say this, but you said *"When I run this on regex101.com they support the s flag..."* in the question. You just had it set to the wrong "flavor" of regex. – T.J. Crowder Sep 25 '17 at 16:13

1 Answers1

6

There was no big logic to it, it just wasn't included, like lots of other regex features other environments have that JavaScript doesn't (so far).

It's in the process of being added now. Currently Stage 3, so maybe ES2018, maybe not Stage 4 as of Dec 2017 so will be in ES2018, but odds are high and you'll see support being added to cutting-edge browsers this year ASAP.

(Look-behind and unicode property escapes are also on the cards...)


Side note:

When I run this on regex101.com they support the s flag...

Not if you set the regex type to JavaScript via the menu. Click the menu button in the top left:

enter image description here

...and change the "flavor" to JavaScript:

enter image description here

You probably left it on its default, which is PCRE, which does indeed support the s flag.

They used to make this more obvious. Since they hid it away on a menu, you're not remotely the first person I've seen not have it set right...

T.J. Crowder
  • 879,024
  • 165
  • 1,615
  • 1,639