37

Just trying to debug a regular expression in ruby. When I print the contents of a regular expression, it shows ?-mix at the beginning of the regular expression even though those characters were not part of the expression. Please see the following IRB output to see this illustrated

irb(main):028:0* EXPR = /^a$/
=> /^a$/
irb(main):029:0> EXPR
=> /^a$/
irb(main):030:0> puts EXPR
(?-mix:^a$)
=> nil

As you can see, when you use puts to print out the contents of a regular expression, there is ?-mix at the beginning. Should I be concerned by this? Where is it coming from?

Ngoral
  • 2,332
  • 2
  • 13
  • 31
Plastikfan
  • 2,722
  • 5
  • 32
  • 47

2 Answers2

48

mix is not the English word mix, it's options of Regexp.

See Regexp#to_s:

Returns a string containing the regular expression and its options (using the (?opts:source) notation.

In your example, m is for multiline mode, i is for case insensitive, and x is for extended mode. Options before the dash are on, those after are off (default). The question's example, ?-mix, has all options off.

You can turn them on like:

puts /^a$/mix
# =>(?mix:^a$)
Aaron
  • 10,807
  • 7
  • 58
  • 102
Yu Hao
  • 111,229
  • 40
  • 211
  • 267
  • great, thanks for that. That m option might be what's screwing up my reg ex. Just need to find out out to set it to single line instead – Plastikfan Feb 20 '15 at 14:17
  • It might be worth noting that the `-` turns those options *off* (which they are by default, but you can switch them on and off for different sections of a regex, if your regex flavor supports that). – Tim Pietzcker Feb 20 '15 at 14:22
  • @Shantaram: Are you aware that Ruby's `(?m)` is the same as every other regex flavor's `(?s)` option? What are you expecting `^` and `$` to match? – Tim Pietzcker Feb 20 '15 at 14:24
  • So if those options are off by default which results in ?-mix, does that mean ?m-ix has multiline turned on? (I moved the m before the -) – Plastikfan Feb 20 '15 at 14:44
16

Regarding the - it's a syntax for flags. Those before the dash are on, and those after are off.

As expalined in the Regexp docs, this is an inline modifier, using the (?on-off) syntax:

The end delimiter for a regexp can be followed by one or more single-letter options which control how the pattern can match.

  • /pat/i - Ignore case
  • /pat/m - Treat a newline as a character matched by .
  • /pat/x - Ignore whitespace and comments in the pattern
  • /pat/o - Perform #{} interpolation only once

i, m, and x can also be applied on the subexpression level with the (?on-off) construct, which enables options on, and disables options off for the expression enclosed by the parentheses.

Hence, in my case this means the options m, i, and x are off and none are on.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
dimid
  • 6,179
  • 1
  • 38
  • 70