I have two or more log files that will be merged into a new file.
Log file format could be like
Dir1 File1Path1 File1Path2 Timestamp tempfileName
Dir1 File2Path1 File2Path2 Timestamp tempfileName
Dir2 File1Path1 File1Path2 Timestamp tempfileName`
Dir3 File1Path1 File1Path2 Timestamp tempfileName
Dir3 File2Path1 File2Path2 Timestamp tempfileName
Dir3 File1Path1 File1Path2 Timestamp tempfileName
Dir4 File1Path1 File1Path2 Timestamp tempfileName`
etc.
My requirements are as follows ;
- Check the format is right in the each line in each log file, i.e. all values are recorded
- Check there are no duplicates
- Verify the files are merged properly i.e. all log lines from each log file has been merged into the new log file.
- Compare the new merged file to a baseline file
I already have written code for 1. I read the file and load the contents into a dataset, by row/colum.
data.Tables[tableName].Columns.Add("Dir");
data.Tables[tableName].Columns.Add("Path1");
data.Tables[tableName].Columns.Add("Path2");
using (StreamReader reader = new StreamReader(log))
{
string line = string.Empty;
while ((line = reader.ReadLine()) != null)
{
data.Tables[tableName].Rows.Add(line.Split(new string[] { "\t" }, data.Tables[tableName].Columns.Count, StringSplitOptions.RemoveEmptyEntries));
}
}
But to accomplish the rest of the tasks, I am not sure if loading the lines into dataset is right? What is the fastest and better approach for this? I can loop over each row value and compare to rest, but I dont think it will faster. log files can be between 20 - 45MB.
Merged log contents should be like this (lines can in any order)
Dir1 File1Path1 File1Path2 Timestamp tempfileName
Dir1 File2Path1 File2Path2 Timestamp tempfileName
Dir2 File1Path1 File1Path2 Timestamp tempfileName
Dir4 File1Path1 File1Path2 Timestamp tempfileName
Dir3 File1Path1 File1Path2 Timestamp tempfileName
Dir3 File2Path1 File2Path2 Timestamp tempfileName
Dir3 File1Path1 File1Path2 Timestamp tempfileName
Thanks for looking.