1

Let me preface this by saying I am new to Regex and C# so I am still trying to figure it out. I also realize that Regex is a deep subject that takes time to understand. I have done a little research to figure this out but I don't have the time needed to properly study the art of Regex syntax as I need this program finished tomorrow. (no this is not homework, it is for my job)

I am using c# to search through a text file line by line and I am trying to use a Regex expression to check whether any lines contain any dates of the current month in the format MM-DD. The Regex expression is used within a method that is passed each line of the file.

Here is the method I am currently using:

private bool CheckTransactionDates(string line)
{ 
   // in the actual code this is dynamically set based on other variables
   string month = "12";

   Regex regExPattern = new Regex(@"\s" + month + @"-\d(0[1-9]|[1-2][0-9]|3[0-1])\s");
   Match match = regExPattern.Match(line);

   return match.Success;
}

Essentially I need it to match if it is preceded by a space and followed by a space. Only if it is the current month (in this case 12), an hyphen, and a day of the month ( " 12-01 " should match but not " 12-99 "). It should always be 2 digits on either side of the hyphen.

This Regex (The only thing I can make match) will work, but also picks up items outside the necessary range:

Regex regExPattern = new Regex(@"\s" + month + @"-\d{2}\s");

I have also tried this without sucess:

Regex regExPattern = new Regex(@"\s" + month + @"-\d[01-30]{2}\s");

Can anyone tell me what I need to change to get the results I need? Thanks in advance.

pscapng
  • 106
  • 1
  • 10
  • You said a pattern like `\s12-\d{2}\s` matches other items you don't want, please provide an example of the text it's matching. Clearly we're flying blind here because we don't know what a sample file might look like where that Regex fails. – Mike Perrenoud Dec 31 '13 at 18:00
  • 2
    I don't understand why it's so important to match a valid day range only. If you really fear that there might be invalid values you could validate it in a second step. A valid day depends on the month, so doing this validation just with regex is hard. – Meta-Knight Dec 31 '13 at 18:01
  • Take a look here: [Regular Expression to match valid dates](http://stackoverflow.com/questions/51224/regular-expression-to-match-valid-dates) or [LMGTFY](https://www.google.com/#q=regex+date). Getting a regex to match that specifically could be difficult and ugly, but you can use regex to narrow it down and check the ranges. – Wonko the Sane Dec 31 '13 at 18:02
  • @Michael Perrenoud An example of a line it matches is: " 11-20 2690 E 28.76 12-02 2468 E* 387.85" However it would also match this line (which I do not want it to match): " 11-15 3610 E 29.34 12-87 2534 E* 465.85" – pscapng Dec 31 '13 at 18:03
  • 3
    Could you not use `DateTime.TryParseExtact`? – Ash Burlaczenko Dec 31 '13 at 18:04

2 Answers2

4

If you just need to find out if the line contains any valid match, something like this will work:

private bool CheckTransactionDates(string line)
{ 
   // in the actual code this is dynamically set based on other variables
   int month = DateTime.Now.Month;
   int daysInMonth = DateTime.DaysInMonth(DateTime.Today.Year, DateTime.Today.Month);

   Regex pattern = new Regex(string.Format(@"{0:00}-(?<DAY>[0123][0-9])", month));
   int day = 0;

   foreach (Match match in pattern.Matches(line))
   {
      if (int.TryParse(match.Groups["DAY"].Value, out day))
      {
         if (day <= daysInMonth)
         {
            return true;
         }
      }
   }

   return false;
}

Here's how it works:

You determine the month to search for (here, I use the current month), and the number of days in that month.

Next, the regex pattern is built using a string.Format function that puts the left-zero-padded month, followed by dash, followed by any two digit number 00 to 39 (the [0123] for the first digit, the [0-9] for the second digit). This narrows the regex matches, but not conclusively for a date. The (?<DAY>...) that surrounds it creates a regex group, which will make processing it later easier. Note that I didn't check for a whitespace, in case the line begins with a valid date. You could easily add a space to the pattern, or modify the pattern to your specific needs.

Next, we check all possible matches on that line (pattern.Matches) in a loop.

If a match is found, we then try to parse it as an integer (it should always work, based on the pattern we are matching). We use the DAY group of that match that we defined in the pattern.

After parsing that match into an integer day, we check to see if that day is a valid number for the month specified. If it is, we return true from the function, as we found a valid date.

Finally, if we found no matches, or if none of the matches is valid, we return false from the function (only if we hadn't returned true earlier).

Wonko the Sane
  • 10,226
  • 7
  • 59
  • 86
  • this certainly looks like it will work for what I want... however, I am a bit confused. Based on what I see in yours and Michael Perrenoud's posts, why did this not work to get a 2 digit number between 01 and 31: \d(0[1-9]|[1-2][0-9]|3[0-1]) – pscapng Dec 31 '13 at 20:59
  • @pscapng - the \d at the beginning of your case matches a digit, and then you are matching the other conditions. Remove that, and you'll get the "01-09 or 10-29 or 30-31" day case you are attempting. However, this will give you matches on February 31st, etc. The point is, it is very difficult to insure valid date data using regex exclusively. – Wonko the Sane Jan 02 '14 at 15:09
  • @WonkotheSane Thank you for explaining that... I understand now. Since everyone has said it would need additional coding to further validate the input, I decided the ROI was too low. I ended up just going with @"\s" + month + @"-\d{2}\s" and no further validation. Thanks again everyone for you help – pscapng Jan 02 '14 at 23:56
  • No problem. However, the ROI is to just copy one of the solutions given here - validation for free. – Wonko the Sane Jan 03 '14 at 01:08
3

One thing to note is that \s matches any white space character, not just a space:

\s match any white space character [\r\n\t\f ]

However, a Regex that literally looks for a space would not, one like this (12-\d{2}). However, I've got to go with the rest of the community a bit on what to do with the matches. You're going to need to go through every match and validate the date with a better approach:

var input = string.Format(
    " 11-20 2690 E 28.76 12-02 2468 E* 387.85{0}11-15 3610 E 29.34 12-87 2534 E",
    Environment.NewLine);

var pattern = string.Format(@" ({0}-\d{{2}}) ", DateTime.Now.ToString("MM"));
var lines = new List<string>();

foreach (var line in input.Split(new string[] { Environment.NewLine },
    StringSplitOptions.RemoveEmptyEntries))
{
    var m = Regex.Match(line, pattern);
    if (!m.Success)
    {
        continue;
    }

    DateTime dt;
    if (!DateTime.TryParseExact(m.Value.Trim(),
        "MM-dd",
        null,
        DateTimeStyles.None,
        out dt))
    {
        continue;
    }
    lines.Add(line);
}

The reason I went through the lines one at a time is because presumably you need to know what line is good and what line is bad. My logic may not exactly match what you need but you can easily modify it.

Mike Perrenoud
  • 63,395
  • 23
  • 143
  • 222