2

I have a bit of code for escaping double-quotes from a string which may include pre-escaped quotes; e.g:

This is a \"string"

Using the following code with Ruby 1.8.7p374:

string.gsub!(/([^\\])"/, '\1\"')

However, I get some funny edge-case when trying it on the following string: ab""c => ab\""c. I would expect it to have escaped both quotes.

It's definitely not a big issue, but it got me curious.
Is this a mistake with my expression? A gsub bug/feature?

(In newer Ruby versions, this could probably be solved easily by using negative lookbacks, but they seem to be not supported in this version).

hjpotter92
  • 71,576
  • 32
  • 131
  • 164
GeReV
  • 2,905
  • 6
  • 29
  • 44
  • The problem is overlapping expressions, the `([^\\])` is not going to match the `"` you have just replaced. – Neil Slater Jun 22 '14 at 16:49
  • It is not clear what you have. If the `"` at the end is a double quote character, then that would be described escaped as `\"`. But then, the `\"` before that would mean you have a backslash character followed by a double quote character, which would be described as `\\\"` when escaped. Is this really what you have? – sawa Jun 22 '14 at 17:15

2 Answers2

3

Requiring a match to a non-\ character means the regex needs to consume that character as well as the quote. The gsub matches also cannot overlap.

You are right that a look-behind assertion would fix this. But without that available, you have a couple of choices in Ruby 1.8.7.

  1. Repeat until there are no substitutions made (gsub! returns nil if there were no matches):

    loop { break unless string.gsub!(/([^\\])"/, '\1\"') }

  2. For 1.8.7, you don't have look-behind assertions. But you can reverse the string, use look-ahead assertions to make your changes, then reverse it back:

    string = string.reverse.gsub(/"(?!\\)/, '"\\').reverse

Neil Slater
  • 25,116
  • 5
  • 71
  • 90
2

Your regex also won’t work if there is a quote at the start of a string, e.g. "ab""c will transform to "ab\""c. The reason for this is similar to your case with double quotes.

After gsub has matched b" and replaced it, it continues from the last match, looking at the next ", but doesn’t look at the previously consumed characters.

You might be able to fix your issue with a lookbehind in newer Ruby versions, but that won’t fix the beginning of string problem. The way to fix that is to use the \G anchor (which is available in Ruby 1.8.7), which matches where the previous match ended or at the start of the string. So you are looking for a " that is either immediately after an non slash or is at the start of the current match (meaning a " has just been matched or this is the start of the string). Something like this:

string.gsub!(/([^\\]|\G)"/, '\1\"')

This will convert the string "ab""c to \"ab\"\"c.

matt
  • 74,317
  • 7
  • 140
  • 183