1

First days dealing with Perl and blocked already :)

Here's the situation: a file is updated in folder A but also exists in folders B, C & D and, to make it easier, it can be different in all of them so I can't just do a diff. New lines that are meant to be copied to the other files are identified by a flag, for instance #I, at the end of the line.

File before being updated looks like this:

    First line
    Second line
    Fifth line

After being updated it looks like this:

    First line
    Second line
    Third line #I
    Fourth line #I
    Fifth line
    Sixth line #I

What I need to do is to search for the "Second line" on the other files, insert lines tagged with #I - in the order they were inserted - then search for the "Fifth line" and insert the "Sixth line #I".

In this example they are all consecutive but in the files I need to update there can be several lines between the first update block and the second (and the third and etc etc).

The files that will be updated can be sh scripts, awk scripts, plain text files, etc., the script is supposed to be generic. The script will have two entry parameters, the updated file and the file to be updated.

Any hints on how to do this are are welcome. I can provide the code I have so far - close but not working yet - if needed.

Thanks,

João

PS: Here's what I have so far

# Pass the content of the file $FileUpdate to the updateFile array
@updateFile = <UPD>;

# Pass the content of the file $FileOriginal to the originalFile array
@originalFile = <ORG>;

# Remove empty lines from the array contained on the updated file
@updateFile = grep(/\S/, @updateFile);

# Create an array that will contain the modifications and the line
# prior to the first modification.
@modifications = ();

# Counter initialization
$i = 0;


# Loop the array to find out which lines are flagged as new and
# which lines immediately precede those
foreach $linha (@updateFile) {

# Remove \n characters
chomp($linha);

# Find the new lines flagged with #I
if ($linha =~ m/#I$/) {

    # Verify that the previous line is not flagged as updated.
    # If it is not, it means that the update starts here.
    unless ($updateFile[$i-1] =~ m/#I$/) {
        print "Line where the update starts $updateFile[$i-1]\n";

        # Add that line to the array modifications
        push(@modifications, $updateFile[$i-1]);

    } # END OF unless 

print "$updateFile[$i]\n";

# Add the lines tagged for insertion into the array
push(@modifications, $updateFile[$i]);

} # END OF if ($linha =~ m/#I$/)

# Increment the counter
$i = $i + 1;

} # END OF foreach $linha (@updateFile) 


foreach $modif (@modifications) {
    unless ($modif =~ m/#I$/) {
        foreach $original (@originalFile) {
            chomp($original);
            if ($original ne $modif) {
                push (@newOriginal, $originalFile[$n]);
            }
            elsif ($original eq $modif) { #&& $modif[$n+1] =~ m/#I$/) {
                push (@newOriginal, $originalFile[$n]);
                last;
            }
            $n = $n + 1;
        }
    }
    if ($modif =~ m/#I$/) {
        push (@newOriginal, $modifications[$m]);
    }
    $m = $m + 1;
}

The result obtained is almost the one I want but not yet.

  • So you are updating the target `A/file` from the sources `B/file`, `C/file` and `D/file`. New lines in the sources are tagged, and you must insert them into the target after a line that is identical to the line in the source preceding the tagged new line. Is that right? Is it OK that this doesn't cater for lines being deleted? What happens if there are multiple identical lines in the source so that you cannot tell where to insert the new record? – Borodin Mar 23 '12 at 14:27
  • Hi TLP, I have added what I have so far. – Joao Villa-Lobos Mar 23 '12 at 14:31
  • Hi Borodin, the update flow is the reverse. A/file will update B/file, C/file and D/file. In principle there will not be multiple identical lines but I haven't really thought about it. Maybe insert on the first one. – Joao Villa-Lobos Mar 23 '12 at 14:33

2 Answers2

1

I finally was able to come back to this issue and it seems I have been able to solve this. Probably not the best solution or "prettiest" but one that is doing what I need :) .

# Open the file

# First parameter is the file containing the update
my ($FileUpdate) = $ARGV[0];

# Second parameter is the file to be updated
my ($FileOriginal) = $ARGV[1];


# \s whitespace characters

# Open both files and give them handles to be referred to further ahead
open(UPD, $FileUpdate) || die("Could not open file $FileUpdate!");
open(ORG, $FileOriginal) || die("Could not open file $FileOriginal!");

# ------------------------------------------------ #
# ---------------- ARRAY CREATION ---------------- #
# ------------------------------------------------ #

# Pass the content of the file $FileUpdate to the updateFile array
@updateFile = <UPD>;

# Pass the content of the file $FileOriginal to the originalFile array
@originalFile = <ORG>;

# Remove empty lines from the array contained on the updated file
@updateFile = grep(/\S/, @updateFile);

# Create an array that will contain the modifications and the line
# prior to the first modification.
@modifications = ();

# Counter initialization
$i = 0;


# ------------------------------------------------ #
# ----- LOOP TO IDENTIFY LINES FOR INSERTION ----- #
# ------------------------------------------------ #

# Loop the array to find out which lines are flagged as new and
# which lines immediately precede those
foreach $linha (@updateFile) {

# Remove \n characters
chomp($linha);

# Find the new lines flagged with #I
if ($linha =~ m/#I$/) {

    # Verify that the previous line is not flagged as updated.
    # If it is not, it means that the update starts here.
    unless ($updateFile[$i-1] =~ m/#I$/) {

        # Add that line to the array modifications
        push(@modifications, $updateFile[$i-1]);

    } # END OF unless 

# Add the lines tagged for insertion into the array
push(@modifications, $updateFile[$i]);

} # END OF if ($linha =~ m/#I$/)

# Increment the counter
$i = $i + 1;

} # END OF foreach $linha (@updateFile) 


# ------------------------------------------------ #
# --------- ADD VALUES TO MODIFICATIONS  --------- #
# ------------------------------------------------ #
foreach $valor (@modifications) {   
print "$valor\n";
}

# ------------------------------------------------ #
# -------------------- BACKUP -------------------- #
# ------------------------------------------------ #

# Make a backup copy from the original file   
# in case something goes wrong when updating it

# Obtain the current time
$tt=localtime();
use POSIX qw(strftime);
$tt = strftime "%Y%m%d-%H%M\n", localtime;

system("cp $FileOriginal $FileOriginal.$tt");

# ------------------------------------------------ #
# ------------- INSERT THE NEW LINES ------------- #
# ------------------------------------------------ #

# Counter initialization
$m = 0;

# New file array
@newOriginal = ();

# Goes through the original file and for each line not present in modifs, writes it .

foreach $original (@originalFile) {
# Initialize counter
$n = 0;

# Remove spaces
chomp ($original);

# Check if the value already exists on the array
# If it doesnt, adds it
if (grep {$_ eq $original} @newOriginal) {
}
else {
    push (@newOriginal, $originalFile[$m]); 
}

# Iterate over the array containing the modifications
# These new lines shall be added to the final file.
foreach $modif (@modifications) {
    # Remove spaces
    chomp ($modif);

    #print "Original: $original, Modif: $modif\n";

    # Initialize counter
    $k = 0;

    # Compare the current value from the original file with
    # the elements that exist on the modifications array.
    # If they are equal push that line in order to be added
    # to the results file.
    if ($original eq $modif) {

        # Increment the counter
        $k = $n+1;

        # Iterate the array with the modifications
        # in order to insert all lines that end with #I
        # immediately after the common line between files.
        foreach my $igual ($k..$#modifications) {

            # Remove spaces
            chomp($igual);

            # If the line ends with #I add it to the final file.
            if ($modifications[$igual] =~ m/#I$/) {

                foreach $newO (@newOriginal) {
                    # Remove spaces
                    chomp($newO);
                    if ($newO ne $modifications[$igual]) {
                        push (@newOriginal, $modifications[$igual]);
                        last;
                    }
                }
            }
            else {
                last;
            }
        }
    }

    # Increment the counter
    $n = $n + 1;
}
# Increment the counter
$m = $m + 1;
}

# ------------------------------------------------ #
# ------------- RESULTS PRESENTATION ------------- #
# ------------------------------------------------ #
$v = 0;
print "--------------------\n";
foreach $vl (@newOriginal) {
print "newOriginal: $newOriginal[$v]\n";
$v = $v + 1;
}
print "--------------------\n";

# ------------------------------------------------ #
# ------------- CREATE UPDATED FILE -------------- #
# ------------------------------------------------ #
$v = 0;

# Create the new name for the file - only for testing purposes now, it will be the original name afterwards
$NewFileToWriteTo = $FileOriginal;
# Retrieve the extension of the file to be updated
my ($ext) = $FileOriginal =~ /(\.[^.]+)$/;
# Remove the extension - just for testing purposes because I want to change the file name now
$NewFileToWriteTo =~ s/$ext//;
# Create the new file name by adding the suffix _tst and the correct extension to it.
$NewFileToWriteTo = $NewFileToWriteTo . '_tst' . ${ext};


# Create the new file or die in case it is not possible to open it
open DAT, ">$NewFileToWriteTo" or die("Could not open file!");


# Write to the new file. This will be the UPDATED version of the ORIGINAL file.
foreach $vl (@newOriginal) {
print DAT "$newOriginal[$v]\n";
$v = $v + 1;
}

# Close all files
close(DAT);
close(UPD);
close(ORG);
0

OK I think I understand what you need, and the program below implements a solution.

I'm not entirely clear what the source (B, C, D) files look like, but I presume they're the same as the target (A) file in its after being updated state in your question.

Another edge case I came across: what if the first line of the source (B, C, D) files is tagged with a #I? I have assumed that it should be inserted at the beginning of the output.

I have also opted to die if the preceding line in the source file isn't found in the target.

Let us know if this is along the right lines.

use strict;
use warnings;

open my $fa, '<', 'A.txt' or die $!;

open my $fb, '<', 'B.txt' or die $!;

my $keyline;
my $inserting;

while (<$fb>) {

  if (/#I$/) {

    if ($keyline) {             # We have to search for a match

      while () {

        my $source = <$fa>;     # read from the target

        if (defined $source) {  # copy to output. stop reading if key is found
          print $source;
          last if $source eq $keyline;
        }
        else {                  # die if key nowhere in target
          chomp $keyline;
          die qq(Key Line "$keyline" not found);
        }
      }

      undef $keyline;           # don't have to search next time
    }

    print;                      # insert the new line
  }
  else {
    $keyline = $_;              # remember the line to search for
  }
}
Borodin
  • 123,915
  • 9
  • 66
  • 138
  • Hi Borodin. Thanks for your reply. I have tried this and replaced A.txt with OriginalFile.txt and B.txt with UpdatedFile.txt. When I ran it, it printed out the contents of the original file without adding to the output the new lines inserted on the UpdatedFile.txt. The UpdatedFile.txt will be the source for all the other files. Concerning the first line issue, from what I saw the first line will not be changed because all files seem to have a header that starts as # -------- #. Might happen but so far I haven't seen any where this could happen. – Joao Villa-Lobos Mar 26 '12 at 09:26
  • @JoaoVilla-Lobos: please clarify which file is which. Of your original folders A, B, C and D, which contains the file that has lines marked with `#I`, and which do you mean by `OriginalFile.txt` and `UpdatedFile.txt`? (My code is expecting to update `A.txt` using the insertions from `B.txt`.) – Borodin Mar 26 '12 at 14:33
  • sorry for not being clear. Although any of them can contain - at one given time - the file that will be used as source and the others the files that need to be updated, lets say that the file that contains the lines that end with #I is located at folder A. This file is the file that I named UpdatedFile.txt. The file to be updated is - poorly named - OriginalFile.txt. – Joao Villa-Lobos Mar 27 '12 at 09:23