blosum62 is a dictonary of 276 items.
I prefered to complete with the lacking items, because it represents an iteration of only 276 turns, while the sequences to be analysed are likely to have more than 276 elements. Consequently, if you find the score of each pair with the help of the function score_match() , this function will have to perform the test if pair not in matrix
for each of the elements of the sequences, that is to say certainly far more than 276 times.
Another thing that takes a lot of time: each score += something
creates a new integer and binds the name score to this new object. Each binding takes an amount of time that doesn't exist with a stream of integers by a generator that are immediatly added to the current amount.
from Bio.SubsMat.MatrixInfo import blosum62 as blosum
from itertools import izip
blosum.update(((b,a),val) for (a,b),val in blosum.items())
def score_pairwise(seq1, seq2, matrix, gap_s, gap_e, gap = True):
for A,B in izip(seq1, seq2):
diag = ('-'==A) or ('-'==B)
yield (gap_e if gap else gap_s) if diag else matrix[(A,B)]
gap = diag
seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'
print sum(score_pairwise(seq1, seq2, blosum, -5, -1))
This score_pairwise() is a generator function because there is yield instead of return.
Edit:
Updated code for Python 3:
from Bio.SubsMat.MatrixInfo import blosum62 as blosum
blosum.update(((b,a),val) for (a,b),val in list(blosum.items()))
def score_pairwise(seq1, seq2, matrix, gap_s, gap_e, gap = True):
for A,B in zip(seq1, seq2):
diag = ('-'==A) or ('-'==B)
yield (gap_e if gap else gap_s) if diag else matrix[(A,B)]
gap = diag
seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'
print(sum(score_pairwise(seq1, seq2, blosum, -5, -1)))