0

After working with a Regex supplied by Cary (thank you!), I realized it is dropping the last date in the string when I run it in IRB. Here is the code and the output. Can anyone tell me why this is happening and how to fix it?

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

str = "September 19, 20, 25, 26, October 2, 3, 4, 10, November 3, 12, 17" 
dates=str.scan(/\D+(?:\d+,\s+)+/).map { |s| [ s[/[a-z]+/i], s.scan(/\d+/) ] } 

p dates 

The output is the following. As you can see, November only returns 2 dates but there are 3 in the string. It drops November 17.

 [["September", ["19", "20", "25", "26"]], ["October", ["2", "3", "4", "10"]], ["November", ["3", "12"]]] 

C:\RailsInstaller\Ruby1.9.3\nokogiri> –

jww
  • 83,594
  • 69
  • 338
  • 732
Gary7
  • 39
  • 4
  • I fixed the error in my [original answer](http://stackoverflow.com/questions/26086602/can-anyone-help-me-dry-this-regex/26087594#comment43610779_26087594) and also changed the approach, I think for the better. It is now `str.scan(/[A-Z][a-z]+|\d+/).each_with_object([]) { |e,b| e[0][/[A-Z]/] ? b << [e,[]] : b.last.last << e }`. – Cary Swoveland Dec 21 '14 at 21:57

3 Answers3

1

This should probably solve the problem:

dates=str.scan(/\D+(?:\d+(?:,\s+|$))+/).map { |s| [ s[/[a-z]+/i], s.scan(/\d+/) ] }
davidrac
  • 10,318
  • 3
  • 35
  • 70
  • Thanks everyone. Actually, Cary Swoveland's answer worked perfectly, but I don't know how to accept a comment. – Gary7 Dec 25 '14 at 13:08
1

The last string does not end with , \s+.

You need to create a case for the end of the string:

str = "September 19, 20, 25, 26, October 2, 3, 4, 10, November 3, 12, 17" 
dates=str.scan(/\D+(?:\d+(?:,\s+|$))+/).map { |s| [ s[/[a-z]+/i], s.scan(/\d+/) ] } 

p dates 
Uri Agassi
  • 35,245
  • 12
  • 68
  • 90
1

Make the ,\s+ portion of the regular expression optional: (?:,\s+)?. Put together:

str.scan(/\D+(?:\d+(?:,\s+)?)+/) ...

It uses a non-capturing group so it doesn't interfere with how the scan method works.

Community
  • 1
  • 1
August
  • 10,796
  • 2
  • 30
  • 47