This is not really a regex, but more a simple parser.
- This works by matching a regex from the start of the string until it encounters a whitespace followed by either
and
or between
followed by a whitespace character. The result is removed from the where_cause
and saved in statement
.
- If the start of the string now starts with a whitespace followed by
between
followed by a whitespace. It is added to statement
and removed from where_cause
with anything after that, allowing 1 and
. Matching stops if the end of the string is reached or another and
is encountered.
- If point 2 didn't match check if the string starts with a whitespace followed by
and
followed by a whitespace. If this is the case remove this from where_cause
.
- Finally add
statement
to the statements
array if it isn't an empty string.
All matching is done case insensitive.
where_cause = "created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30' AND updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30' AND user_id = 5 AND status = 'closed'"
statements = []
until where_cause.empty?
statement = where_cause.slice!(/\A.*?(?=[\s](and|between)[\s]|\z)/mi)
if where_cause.match? /\A[\s]between[\s]/i
between = /\A[\s]between[\s].*?[\s]and[\s].*?(?=[\s]and[\s]|\z)/mi
statement << where_cause.slice!(between)
elsif where_cause.match? /\A[\s]and[\s]/i
where_cause.slice!(/\A[\s]and[\s]/i)
end
statements << statement unless statement.empty?
end
pp statements
# ["created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30'",
# "updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30'",
# "user_id = 5",
# "status = 'closed'"]
Note: Ruby uses \A
to match the start of the string and \z
to match the end of a string instead of the usual ^
and $
, which match the beginning and ending of a line respectively. See the regexp anchor documentation.
You can replace every [\s]
with \s
if you like. I've added them in to make the regex more readable.
Keep in mind that this solution isn't perfect, but might give you an idea how to solve the issue. The reason I say this is because it doesn't account for the words and
/between
in column name or string context.
The following where cause:
where_cause = "name = 'Tarzan AND Jane'"
Will output:
#=> ["name = 'Tarzan", "Jane'"]
This solution also assumes correctly structured SQL queries. The following queries don't result in what you might think:
where_cause = "created_at = BETWEEN AND"
# TypeError: no implicit conversion of nil into String
# ^ does match /\A[\s]between[\s]/i, but not the #slice! argument
where_cause = "id = BETWEEN 1 AND 2 BETWEEN 1 AND 3"
#=> ["id = BETWEEN 1 AND 2 BETWEEN 1", "3"]