I don't know how to parallelise a code in Python that takes each line of a FASTA file and makes some statistics, like compute GC content, of it. Do you have some tips or libraries that will help me to decrease the time spent in execution?
I've tried to use os.fork(), but it gives me more execution time than the sequential code. Probably is due to I don't know very well how to give each child a different sequence.
#Computing GC Content
from Bio import SeqIO
with open('chr1.fa', 'r') as f:
records = list (SeqIO.parse(f,'fasta'))
GC_for_sequence=[]
for i in records:
GC=0
for j in i:
if j in "GC":
GC+=1
GC_for_sequence.append(GC/len(i))
print(GC_for_sequence)
The expected execution would be: Each process takes one sequence, and they do the statistics in parallel.