File comparison

Question

I am a beginner. I am looking for a basic shell script solving what looks a simple problem: I have one long file, file A that looks like below:

I would like to generate a new file (Target file C ) that is essentially file A, but with an extra field on the first line, say "Comment" where all lines whose items of the first field that match any of the items in column 1 of file B are identified by a mark, say "SHARED". Files A and B are csv files

I have tried awk and a basic shell script that is easier for me to understand, but I could not get it to work. I could generate a blank target file, with the target first line containing the 3 fields if necessary.

File A

"Part Number","Description"
"1468896-1","MCD-MXSER-21-P-X-0209"
"1495581-1","MC-P-15S5127854ST1"
"1497458-3","MC -N1-P-569RT1"

File B

"1466826-1"
"1495582-1"
"1495581-1"

Desired target file C

"Part Number","Description","Comment"
"1468896-1","MCD-MXSER-21-P-X-0209"
"1495581-1","MC-P-15S5127854ST1","SHARED"
"1497458-3","MC -N1-P-569RT1"

score 1 · Answer 1 · answered May 24 '14 at 14:45

1

this one-liner should do the job:

awk -F, -v c='"Comment"' -v s='"SHARED"' 
   'NR==FNR{a[$1]=1;next}FNR==1{$0=$0 FS c}FNR>1&&a[$1]{$0=$0 FS s}7' fileb filea

answered May 24 '14 at 14:45

Kent

173,042
30
210
270

score 0 · Answer 2 · answered May 24 '14 at 14:31

0

You can do it like this:

awk -F, 'FNR==NR{a[i++]=$1;next} {extra="";for(t in a)if($1==a[t])extra=",\"SHARED\"";print $0,extra}' fileB fileA

You will see both fileA and fileB are passed into awk. The processing in {} following FNR==NR only applies to fileB. It stores the first element of each line in an array a[] and then skips to the next line.

The processing in the second set of {} only applies to fileA. Basically it pre-sets a string called extra to nothing. It then tests if the first field of the current record is in array a[]. If it is, it sets extra to ",SHARED". It then prints the current record and the string extra which may, or may not, be ",SHARED".

answered May 24 '14 at 14:31

Mark Setchell

146,975
21
182
306

1

note1, this doesn't generate the `"Comment"` title; note2, you can use the `$1` as the array index, then you can change it from `O(n*m) -> O(n+m)` note3, you didn't set `OFS`, it's ok, but you `print $0,extra`, it will add an additional space before the hardcoded comma `,` – Kent May 24 '14 at 14:47
@Kent Thank you for your insights. I had trouble with this - I needed a lop to test if $1 was in a[], although previously I have just written `if($1 in a)` it just wouldn't work here! No idea why? – Mark Setchell May 24 '14 at 14:57
instead of `a[i++]=$1` use `a[$1]` also `i++` will make first index `0`, it is ok for almost all programming languages, but array starts with `1` with awk convention – Kent May 24 '14 at 15:04

score 0 · Accepted Answer · answered May 24 '14 at 14:47

0

If you want to do it in bash

#!/bin/bash
while IFS=, read f1 line
do
   if grep -qw "$f1" fileB ; then
      echo $f1,$line,\"SHARED\"
   fi
   echo $f1,$line
done < fileA

answered May 24 '14 at 14:47

Mark Setchell

146,975
21
182
306

For beginners like me, bash is a bit more user friendly, therefore looks more flexible. However awk is awsome when it comes to speed. Not essential here. – Yves May 25 '14 at 14:37

File comparison

File A

File B

Desired target file C

3 Answers3