-1

Use Case: I import text files and need to read their contents which consist of 4 lines. With these 4 lines, I parse out string(s) that are not pre-defined, but rather are dynamic.

Here's an example that I gathered from @zx81:

Input:

on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec

So given the above 4 lines, which I'm thinking of either keeping them with their carriage returns (i.e. as 4 lines) or making it all one string (i.e. just one line), I am going to extract properties and put them into memory via a Class's properties, e.g. ReportDate, ReportTime, EmployeeName, ServerName, AccessType, ReasonType, ProgramId, Start, Length.

Desired Output:

ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec

This is all I want - all of the items found on the RHS of the equals sign, i.e. certain strings assigned to specific properties found in an Object in memory which ultimate respond to a database table's columns. From the above example, the property EmployeeName will always be in the same place (between specific strings) and thus will be parsing out its value, e.g. "John Doe". Of course, with every file that I am bringing in, these values will be different, thus the dynamic part of it.

Hope this helps, thanks.

user118190
  • 2,059
  • 6
  • 25
  • 44

1 Answers1

2

Given your data, something like this would output what you want:

Output:

ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec

Code:

using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program
{

    static void Main()
    {
    string s1 = @"on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec";

    try
    {
    var myRegex = new Regex(@"(?s)^on\s+([\w, ]+?) at (\d{2}:\d{2}).*?Employee ([\w ]+) accessed.*?server - (\w+).*?(\w+) was (\w+) - program: (\w+), start: (\w+), (\d+\.\d+ \w+)");
    string date = myRegex.Match(s1).Groups[1].Value;
    string time = myRegex.Match(s1).Groups[2].Value;
    string name = myRegex.Match(s1).Groups[3].Value;
    string server = myRegex.Match(s1).Groups[4].Value;
    string access = myRegex.Match(s1).Groups[5].Value;
    string reason = myRegex.Match(s1).Groups[6].Value;
    string prog = myRegex.Match(s1).Groups[7].Value;
    string start = myRegex.Match(s1).Groups[8].Value;
    string length = myRegex.Match(s1).Groups[9].Value;
    Console.WriteLine("ReportDate = " + date);
    Console.WriteLine("ReportTime = " + time);
    Console.WriteLine("EmployeeName = " + name);
    Console.WriteLine("ServerName = " + server);
    Console.WriteLine("AccessType = " + access);
    Console.WriteLine("ReasonType = " + reason);
    Console.WriteLine("ProgramId = " + prog);
    Console.WriteLine("Start = " + start);
    Console.WriteLine("Length = " + length);
    }
    catch (ArgumentException ex)
    {
    // We have a syntax error
    }

    Console.WriteLine("\nPress Any Key to Exit.");
    Console.ReadKey();
    } // END Main
} // END Program

Tweaking it

However, to tweak it you are going to have to brush up your regex.

To start you off, here is a token-by-token explanation of the regex in the code. Then I recommend you visit the FAQ, RexEgg and other sites mentioned in the FAQ.

@"
(?                 # Use these options for the whole regular expression
   s                  # Dot matches line breaks
)
^                  # Assert position at the beginning of the string
on                 # Match the character string “on” literally (case sensitive)
\s                 # Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
   +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  # Match the regex below and capture its match into backreference number 1
   [\w,\ ]            # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # A single character from the list “, ”
      +?                 # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
)
\ at\              # Match the character string “ at ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 2
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
   :                  # Match the character “:” literally
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Employee\          # Match the character string “Employee ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 3
   [\w\ ]             # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # The literal character “ ”
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ accessed         # Match the character string “ accessed” literally (case sensitive)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
server\ -\         # Match the character string “server - ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 4
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
(                  # Match the regex below and capture its match into backreference number 5
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ was\             # Match the character string “ was ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 6
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ -\ program:\     # Match the character string “ - program: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 7
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\ start:\         # Match the character string “, start: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 8
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\                 # Match the character string “, ” literally
(                  # Match the regex below and capture its match into backreference number 9
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \.                 # Match the character “.” literally
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \                  # Match the character “ ” literally
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"
Community
  • 1
  • 1
zx81
  • 38,175
  • 8
  • 76
  • 97
  • Amazing! Thanks @zx81 for the VERY helpful explanation in understanding the solution and the concepts behind them! – user118190 May 02 '14 at 05:16
  • 1
    @user118190 Hey you're welcome, it was a pleasure, glad to hear that it works. Thanks for your kind feedback. :) By the way you should know that I did not write the token-by-token explanation by hand: I asked regexbuddy to generate it [(trial here)](http://yu8.us/rbdemo). If you're going to be doing what it seems like you're about to be doing, that could be a worthwhile tool. – zx81 May 02 '14 at 06:06