0

I have a string:

"1 chocolate bar at 25"

and I want to split this string into:

 [1, "chocolate bar", 25]

I don't know how to write a regex for this split. And I wanted to know whether there are any other functions to accomplish it.

sawa
  • 156,411
  • 36
  • 254
  • 350
Shubh
  • 2,707
  • 5
  • 23
  • 42

6 Answers6

4

You could use scan with a regex:

"1 chocolate bar at 25".scan(/^(\d+) ([\w ]+) at (\d+)$/).first

The above method doesn't work if item_name has special characters.

If you want a more robust solution, you can use split:

number1, *words, at, number2 = "1 chocolate bar at 25".split
p [number1, words.join(' '), number2]
# ["1", "chocolate bar", "25"]

number1 is the first part, number2 is the last one, at the second to last, and *words is an array with everything in-between. number2 is guaranteed to be the last word.

This method has the advantage of working even if there are numbers in the middle, " at " somewhere in the string or if prices are given as floats.

Eric Duminil
  • 48,038
  • 8
  • 56
  • 100
  • You thought it would be possible for * at * to be there in `item_name`. Okay, right. You also thought an `item_name` may contain `_` in it. (as per using `\w`) but the question is why didn't you think an apostrophe `'` (: *O'reilly* ), a comma `,` (: *MySQL, PHP book* ) or parentheses `()` (: *Server (Apache)* ) and many more special characters could be a part of a name?! – revo Jul 07 '17 at 13:22
  • I know you have two different imaginations which conflict. I down-voted for the first. That was my constructive criticism. – revo Jul 07 '17 at 13:30
  • Your second method also doesn't have anyway to apply `^` and `$` anchors on input string which you were arguing about that insistently under my post. – revo Jul 07 '17 at 13:33
  • *`number2` will always be the last part of the string*... but second method doesn't ensure that `number2` will always contain a price! I remember you said *You need at least `$` or `\z` at the end of your regexp* because you had a reason! So you should apply your reason / idea on your own solutions as well. – revo Jul 07 '17 at 14:02
  • Yes, *guaranteed to be the last word* but not the right chunk. – revo Jul 07 '17 at 14:12
  • For sure, that's what I'm saying all the time, we don't know anything more than that! But I don't think you read my comments at all hence repeating nonsense over and over. You should tell me the reason why you were trying to convince me to append `$` or `\z` to the regex. If you know the reason you should know your second solution doesn't follow it. Simple English, isn't it? If you can't get the point then I'm not the culprit. That's enough for me the readers of these comments, if any actually, will understand what was going on. Period. – revo Jul 07 '17 at 14:23
  • I don't like to continue. The one who began arguing were you not me. Your first method doesn't accept this `at x at 1.99 at` but the second does. You are suggesting two different problematic ways. You never explained differences between them and never noted their failing cases and their matching domain. You are only providing a temp solution to the problem. Wasted! – revo Jul 07 '17 at 14:50
  • Don't kid yourself. I'm done at this point. – revo Jul 07 '17 at 15:00
1

It is not necessary to use a regular expression.

str = "1 chocolate bar, 3 donuts and a 7up at 25"

i1 = str.index(' ')
  #=> 1
i2 = str.rindex(' at ')
  #=> 35
[str[0,i1].to_i, str[i1+1..i2-1], str[i2+3..-1].to_i]
  #=> [1, "chocolate bar, 3 donuts and a 7up", 25]
Cary Swoveland
  • 94,081
  • 5
  • 54
  • 87
1

I would do:

> s="1 chocolate bar at 25"
> s.scan(/[\d ]+|[[:alpha:] ]+/)
=> ["1 ", "chocolate bar at ", "25"]

Then to get the integers and the stripped string:

> s.scan(/[\d ]+|[[:alpha:] ]+/).map {|s| Integer(s) rescue s.strip}
=> [1, "chocolate bar at", 25]

And to remove the " at":

> s.scan(/[\d ]+|[[:alpha:] ]+/).map {|s| Integer(s) rescue s[/.*(?=\s+at\s*)/]}
=> [1, "chocolate bar", 25]
dawg
  • 80,841
  • 17
  • 117
  • 187
0

You may try returning captures property of match method on regex (\d+) ([\w ]+) at (\d+):

string.match(/(\d+) +(\D+) +at +(\d+)/).captures

Live demo

Validating input string

If you didn't validate your input string to be within desired format already, then there may be a better approach in validating and capturing data. This solution also brings the idea of accepting any type of character in item_name field and decimal prices at the end:

string.match(/^(\d+) +(.*) +at +(\d+(?:\.\d+)?)$/).captures
revo
  • 43,830
  • 14
  • 67
  • 109
  • `\D` is a bit weird matcher. `22 bottles of 7up at 20`? – Aleksei Matiushkin Jul 07 '17 at 08:41
  • I assume *item_name* couldn't have digits inside. Otherwise replacing `\D` with `[\w ]` will be enough. – revo Jul 07 '17 at 08:42
  • `\w` obviously won’t work either. – Aleksei Matiushkin Jul 07 '17 at 08:43
  • 1
    Thanks again for saving me from a future issue :) – Shubh Jul 07 '17 at 08:43
  • 1
    @revo: "What goes around comes around?" No, this isn't a symmetrical situation. I told you what the problem with your answer is. You only complained about another solution, which was actually more robust than yours. You didn't change anything, so I downvoted your answer. If you have constructive criticism, I'd be happy to hear it. But please don't downvote just for revenge. – Eric Duminil Jul 07 '17 at 11:51
  • *If you have constructive criticism...* I'd appreciate it if you had one. @EricDuminil – revo Jul 07 '17 at 12:03
  • I didn't ***only*** complained about the other solution. I had absolutely some valid points. And not only me but two more people thought the same. @EricDuminil – revo Jul 07 '17 at 12:07
  • Read this from OP: *I have a string* and be more precise about it. He didn't say *I've a string that **contains** or **may contain** a string like this...*. So appending `$` to regex is just an option and should be explained if someone did it. @EricDuminil – revo Jul 07 '17 at 12:14
  • Don't remove your comments to not mess up others, please. @EricDuminil – revo Jul 07 '17 at 12:19
  • I don't care about assumptions, I said already. *chocolate bar* is fine as an *item_**name*** but what the hell on earth a name could contain " at ".... I don't know really. You can ping original poster to clarify things. Don't think you are right when you are not. @EricDuminil – revo Jul 07 '17 at 12:27
  • If it is possible then I can think of an invalid input string since ` at \d+` shouldn't be placed in a name field which has to be filtered. And yes, decimal pricing is common I never said it's not. @EricDuminil – revo Jul 07 '17 at 13:05
  • It's up to original poster. I never noted weather or not it would be a case neither did he, that's why I didn't expand my answer to include it. If you like to be more than enough that's not bad and there is nothing wrong with it. @EricDuminil – revo Jul 07 '17 at 13:11
0

You can also do something like this:

"1 chocolate bar at 25"
  .split()
  .reject {|string| string == "at" }
  .map {|string| string.scan(/^\D+$/).empty? ? string.to_i : string }

Code Example: http://ideone.com/s8OvlC

Greg Burghardt
  • 14,951
  • 7
  • 38
  • 71
dhaliman
  • 1,383
  • 1
  • 9
  • 22
-1

I live in the country where prices might be float, hence the more sophisticated matcher for the price.

"1 chocolate bar at 25".
  match(/\A(\d+)\s+(.*?)\s+at\s+(\d[.\d]*)\z/).
  captures
#⇒ ["1", "chocolate bar", "25"]
Aleksei Matiushkin
  • 105,980
  • 9
  • 87
  • 132
  • Not right on *1 chocolate bar at .* – revo Jul 07 '17 at 08:46
  • @revo indeed, fixed. – Aleksei Matiushkin Jul 07 '17 at 08:47
  • Why should it return third capture group as `5......` in *1 chocolate bar at 5......*? – revo Jul 07 '17 at 08:48
  • @revo because `1 chocolate bar at 5......` is not a valid input as by OP? – Aleksei Matiushkin Jul 07 '17 at 09:19
  • So you mean you hadn't to edit your post after my first comment? – revo Jul 07 '17 at 09:26
  • By considering a floating point number as *item_price* you are defining your own *valid input* not OP's. So do it right or don't at all. – revo Jul 07 '17 at 09:29
  • "Most correct"? What isn't correct about my second method? – Eric Duminil Jul 07 '17 at 09:29
  • @EricDuminil indeed, your latter method is likely even better, I beg your pardon, – Aleksei Matiushkin Jul 07 '17 at 09:34
  • @revo: What if item_price isn't stored as a float? muda's answer works just fine. So what's the problem? For what it's worth, your answer isn't correct because you don't check the pattern matches the whole string. – Eric Duminil Jul 07 '17 at 09:41
  • @revo: And how did you get this information? The only thing that OP wrote is `The format of string will be "no_of_item item_name at price"`. There's no information at all about `item_name`, so you shouldn't assume anything about it, yet you did. That's why you got my downvote. What's *your* reason for downvoting me, expect childish retaliation? – Eric Duminil Jul 07 '17 at 10:03
  • @revo: Okay, you don't want to understand, so I'll just try one last time and then give up. "No information about `item_name`" means that it could be any string, with any character. Done. – Eric Duminil Jul 07 '17 at 10:09
  • 1
    @revo Because you lose one point when you downvote. Revenge and lies don't belong here. Please behave like a grown up on SO – Eric Duminil Jul 07 '17 at 10:38