I am working in bash. I am trying to find unique barcodes within strings in a .txt file. Each string can contain 3 barcodes. I want to identify and label each unique configuration that contains my barcodes of interest.
This is my starting reads.txt
file that contains the strings I want to evaluate.
ABCD1
EFGH2
ABGH1
EFCD2
As an example the barcodes contained in ABCD1
are AB
, CD
and 1
.
My desired result is to identify only srings ABCD1
and EFGH2
and to store each of them respectively as result.1.txt
and result.2.txt
Bellow is my attempt.
# Add the barcode sequences to a bash array
declare -a BARCODES1=(AB EF)
declare -a BARCODES2=(CD GH)
declare -a BARCODES3=(1 2)
# Initialize counter
count=1
# Search for the barcode sequences in the reads.txt file
rm ROUND*
rm result*
for barcode in "${BARCODES1[@]}";
do
grep "$barcode1" reads.txt > ROUND1_MATCHES.txt
for barcode2 in "${BARCODES2[@]}";
do
grep "$barcode2" ROUND1_MATCHES.txt > ROUND2_MATCHES.txt
for barcode3 in "${BARCODES3[@]}";
do
grep "$barcode3" ROUND2_MATCHES.txt > ROUND3_MATCHES.txt
if [ -s ROUND3_MATCHES.txt ]
then
mv ROUND3_MATCHES.txt result.$count.txt
fi
count=`expr $count + 1`
done
done
done
Strangely this code outputs too many results files. Running head results*
gives me the following.
==> result.1.txt <==
ABCD1
==> result.2.txt <==
EFCD2
==> result.3.txt <==
ABGH1
==> result.4.txt <==
EFGH2
==> result.5.txt <==
ABCD1
==> result.6.txt <==
EFCD2
==> result.7.txt <==
ABGH1
==> result.8.txt <==
EFGH2
The desired result would have been
==> result.1.txt <==
ABCD1
==> result.2.txt <==
EFCD2