19

Given a date time format string, is there a standard way to find the first matching substring that matches that format?

for example, given...

d-MMM-yy H:mm:ss

and some text...

"blah 1 2 3 7-Jul-13 6:15:00 4 5 6 blah"

I'd expect it to return

"7-Jul-13 6:15:00"

Now I can find this string by doing parsing, but I'm wondering if there is any library support for doing this?

abatishchev
  • 92,232
  • 78
  • 284
  • 421
Keith Nicholas
  • 41,161
  • 15
  • 82
  • 145
  • It might not be, however, it does mean if you want to find strings that match a datetime format you have to at least partially re-invent the inbuilt formatting conventions. – Keith Nicholas Jul 08 '13 at 00:26
  • @KeithNicholas I don't think there is a library that would do this but you can surely try regex [**`(\d{,2}-\w{3}-\d{,4}\s\d{,2}:\d{,2}:\d{,2})`**](http://rubular.com/r/eGU6yqZXgB) which I believe would be the way to go here or you could try to split by the terms you have and work with indexes – Prix Jul 08 '13 at 00:45
  • @Prix I only gave one example, but the question does ask about a date time format string, which could be any valid date time format string. – Keith Nicholas Jul 08 '13 at 01:05
  • @KeithNicholas with your example and pattern, 17-Jul for instance would fail, also 2013 would fail, so if u have a set of given possibilities you can also make a set of regex that you can use. – Prix Jul 08 '13 at 01:10
  • @Prix, 17 would pass as the 'd' means 1 or 2 digits, if it was dd it would have to be 07-Jul... imagine that you have no prior knowledge of which datetime format string you have got. Its something given at runtime. As long as its valid format string. – Keith Nicholas Jul 08 '13 at 01:12

6 Answers6

8

This may not be the most efficient but it seemed like an interesting question so I thought I'd try this method.

It takes your DateTime format string and makes a Regex string pattern out of it by replacing any letters with . and whitespace with \\s. It then creates a Regex object out of that pattern and tries to find the first match in the input sentence.

That match, if it exists, is then passed into a DateTime.TryParseExact call. I'm sure improvements can be made but this might help give a general idea on a technique that doesn't require hardcoding a Regex or the format of the input sentence.

string inputSentence = "blah 1 2 3 7-Jul-13 6:15:00 4 5 6 blah";

string dtformat = "d-MMM-yy H:mm:ss";

//convert dtformat into regex pattern
StringBuilder sb = new StringBuilder();
foreach (char c in dtformat)
{
    if (Char.IsLetter(c))
    {
       if (char.ToUpperInvariant(c) == 'D' || char.ToUpperInvariant(c) == 'H' || char.ToUpperInvariant(c) == 'S')            
          sb.Append(".{1,2}");
       else
          sb.Append(".");
    }
    else if(Char.IsWhiteSpace(c))        
       sb.Append("\\s");
    else
       sb.Append(c);
}


string dtPattern = sb.ToString();

Regex dtrx = new Regex(dtPattern);

//get the match using the regex pattern
var dtMatch = dtrx.Match(inputSentence);

if(dtMatch != null)
{
    string firstString = dtMatch.Value.Trim();

    //try and parse the datetime from the string
    DateTime firstMatch;
    if (DateTime.TryParseExact(dstr, dtformat, null, DateTimeStyles.None, out firstMatch))
    {
       Console.WriteLine("Parsed");
    }
    else
    {
       Console.WriteLine("Could not parse");
    }
}
keyboardP
  • 66,755
  • 13
  • 145
  • 199
  • `H` should match either 1 or 2 digits. Your code builds a regex that only matches single digit hours. – Blorgbeard Jul 08 '13 at 01:03
  • Thank you, I've updated the code. Think that should work with both formats now. – keyboardP Jul 08 '13 at 01:13
  • @keyboardP this is kind of how I went with parsing it myself. You need think about escaping regex characters as well, as they could be used for seperators in datetime formats, you also need to handle datetime format escapes..... – Keith Nicholas Jul 08 '13 at 01:24
  • @KeithNicholas - This was just a quick mock up of an idea. There are few more cases that likely need to be handled but I don't think there's a way to do this that's built into the framework. – keyboardP Jul 08 '13 at 01:28
  • @keyboardP yeah... there are more cases, so you end up with quite a lot of code to translate formats into regex matches. With the potential that you may not handle a particular case. Which is why I was wondering if there was a better way! :-) – Keith Nicholas Jul 08 '13 at 01:30
  • @KeithNicholas - I guess there's no "one size fits all" way but if you know the range of your date formats input you could wrap the above up in a method and use that. There may be a better way but, AFAIK, not built in I'm afraid (of course, would love to be proven wrong on that :D ) – keyboardP Jul 08 '13 at 01:32
7

Maybe something like this:

Find each part of the format string by parsing each word in the text and combine the 2 to creat the DateTime

string test = "blah 1 2 3 7-Jul-13 6:15:00  4 5 6 blah";

int formatPart = 0;
bool dateFound = false;
string format = "d-MMM-yy H:mm:ss";
DateTime myDateTime = DateTime.MinValue;
foreach (var item in test.Split(' '))
{
    DateTime dummy;
    if (DateTime.TryParseExact(item, format.Split(' ')[formatPart], null, DateTimeStyles.NoCurrentDateDefault, out dummy))
    {
        if (myDateTime == DateTime.MinValue)
        {
            formatPart++;
            myDateTime = dummy;
            dateFound = myDateTime.Date != DateTime.MinValue.Date;
            continue;
        }

        // If date was found first, add time, else add date
        myDateTime = dateFound
         ? myDateTime.Add(new TimeSpan(dummy.Hour, dummy.Minute, dummy.Second))
         : dummy.Add(new TimeSpan(myDateTime.Hour, myDateTime.Minute, myDateTime.Second));
        break;
    }
}

Tested:

Input: "blah 1 2 3 7-Jul-13 6:15:00  4 5 6 blah"
Format: "d-MMM-yy H:mm:ss"

Input: "blah 1 2 3 6:15:00 7-Jul-13 4 5 6 blah"
Format: "H:mm:ss d-MMM-yy"

Input: "blah 1 2 3 6:15:00 7-7-2013 4 5 6 blah"
Format: "H:mm:ss d-M-yyyy"

Input: "blah 1 2 3 07-07-2013 6:15:00  4 5 6 blah"
Format: "dd-MM-yyyy H:mm:ss" 
sa_ddam213
  • 39,994
  • 7
  • 93
  • 106
1

You may try NodaTime:

var input = "blah 1 2 3 7-Jul-13 6:15:00 4 5 6 blah";
var pattern = "d-MMM-yy H:mm:ss";
var nodaPattern = NodaTime.Text
    .LocalDateTimePattern
    .Create(pattern, System.Globalization.CultureInfo.CurrentUICulture);
for (int i = 0; i < input.Length - pattern.Length; i++)
{
    var result = nodaPattern.Parse(input.Substring(i, pattern.Length));
    if (result.Success)
    {
        Console.WriteLine(result.Value);
        break;
    }
}
Alex Filipovici
  • 29,732
  • 5
  • 50
  • 76
0

I'm not aware of anything other than DateTime.TryParse (or alternately, a RegEx) for doing this in .NET.

I would set up a stream tokenizer, passing only candidate token pairs into DateTime.TryParse (based on some combination of minimum string length, an maybe substring checks for a pair of dashes on token 0 and a pair of colons on token 1, etc. Exact checks would depend on how many date/time formats you're supporting)

holtavolt
  • 4,247
  • 1
  • 23
  • 38
0

You could try a regular expression like:

^[0-9]+-[a-z A-z]+-[0-9]+\s[0-9]+:[0-9]+:[0-9]+
apb
  • 383
  • 2
  • 8
0

You can try this:

string original = "blah 1 2 3 7-jul-13 6:15:00 4 5 6 blah";
Match mc = Regex.Match(original, @"\s*(?<date>\d+[-/][A-Z]{3}[-/]\d+)\s*([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]", RegexOptions.IgnoreCase);

if (mc.Success)
{
    string datetime = mc.Groups[0].Value;
}
terrybozzio
  • 4,161
  • 1
  • 16
  • 25