Extracting strings from an array of strings given criteria, i.e. a Class's Properties

Question

Use Case: I import text files and need to read their contents which consist of 4 lines. With these 4 lines, I parse out string(s) that are not pre-defined, but rather are dynamic.

Here's an example that I gathered from @zx81:

Input:

on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec

So given the above 4 lines, which I'm thinking of either keeping them with their carriage returns (i.e. as 4 lines) or making it all one string (i.e. just one line), I am going to extract properties and put them into memory via a Class's properties, e.g. ReportDate, ReportTime, EmployeeName, ServerName, AccessType, ReasonType, ProgramId, Start, Length.

Desired Output:

ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec

This is all I want - all of the items found on the RHS of the equals sign, i.e. certain strings assigned to specific properties found in an Object in memory which ultimate respond to a database table's columns. From the above example, the property EmployeeName will always be in the same place (between specific strings) and thus will be parsing out its value, e.g. "John Doe". Of course, with every file that I am bringing in, these values will be different, thus the dynamic part of it.

Hope this helps, thanks.

Couldn't you just use string.Split(" ")? This will give you an array with all your words. — Shalin Ved, May 01 '14 at 04:51
Thanks everyone so far that has responded. I have updated my original posting in hopes it answers your questions. Please reference the new section under the "Edit 1" label. Thanks again! — user118190, May 01 '14 at 05:34
@user118190 I completely revised my answer to match the new data you supplied. :) — zx81, May 01 '14 at 21:23

score 2 · Accepted Answer · edited May 23 '17 at 12:05

Given your data, something like this would output what you want:

Output:

ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec

Code:

using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program
{

    static void Main()
    {
    string s1 = @"on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec";

    try
    {
    var myRegex = new Regex(@"(?s)^on\s+([\w, ]+?) at (\d{2}:\d{2}).*?Employee ([\w ]+) accessed.*?server - (\w+).*?(\w+) was (\w+) - program: (\w+), start: (\w+), (\d+\.\d+ \w+)");
    string date = myRegex.Match(s1).Groups[1].Value;
    string time = myRegex.Match(s1).Groups[2].Value;
    string name = myRegex.Match(s1).Groups[3].Value;
    string server = myRegex.Match(s1).Groups[4].Value;
    string access = myRegex.Match(s1).Groups[5].Value;
    string reason = myRegex.Match(s1).Groups[6].Value;
    string prog = myRegex.Match(s1).Groups[7].Value;
    string start = myRegex.Match(s1).Groups[8].Value;
    string length = myRegex.Match(s1).Groups[9].Value;
    Console.WriteLine("ReportDate = " + date);
    Console.WriteLine("ReportTime = " + time);
    Console.WriteLine("EmployeeName = " + name);
    Console.WriteLine("ServerName = " + server);
    Console.WriteLine("AccessType = " + access);
    Console.WriteLine("ReasonType = " + reason);
    Console.WriteLine("ProgramId = " + prog);
    Console.WriteLine("Start = " + start);
    Console.WriteLine("Length = " + length);
    }
    catch (ArgumentException ex)
    {
    // We have a syntax error
    }

    Console.WriteLine("\nPress Any Key to Exit.");
    Console.ReadKey();
    } // END Main
} // END Program

Tweaking it

However, to tweak it you are going to have to brush up your regex.

To start you off, here is a token-by-token explanation of the regex in the code. Then I recommend you visit the FAQ, RexEgg and other sites mentioned in the FAQ.

@"
(?                 # Use these options for the whole regular expression
   s                  # Dot matches line breaks
)
^                  # Assert position at the beginning of the string
on                 # Match the character string “on” literally (case sensitive)
\s                 # Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
   +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  # Match the regex below and capture its match into backreference number 1
   [\w,\ ]            # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # A single character from the list “, ”
      +?                 # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
)
\ at\              # Match the character string “ at ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 2
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
   :                  # Match the character “:” literally
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Employee\          # Match the character string “Employee ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 3
   [\w\ ]             # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # The literal character “ ”
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ accessed         # Match the character string “ accessed” literally (case sensitive)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
server\ -\         # Match the character string “server - ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 4
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
(                  # Match the regex below and capture its match into backreference number 5
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ was\             # Match the character string “ was ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 6
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ -\ program:\     # Match the character string “ - program: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 7
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\ start:\         # Match the character string “, start: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 8
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\                 # Match the character string “, ” literally
(                  # Match the regex below and capture its match into backreference number 9
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \.                 # Match the character “.” literally
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \                  # Match the character “ ” literally
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"

Amazing! Thanks @zx81 for the VERY helpful explanation in understanding the solution and the concepts behind them! — user118190, May 02 '14 at 05:16
@user118190 Hey you're welcome, it was a pleasure, glad to hear that it works. Thanks for your kind feedback. :) By the way you should know that I did not write the token-by-token explanation by hand: I asked regexbuddy to generate it [(trial here)](http://yu8.us/rbdemo). If you're going to be doing what it seems like you're about to be doing, that could be a worthwhile tool. — zx81, May 02 '14 at 06:06

Extracting strings from an array of strings given criteria, i.e. a Class's Properties

1 Answers1