Entering edit mode
8.2 years ago
ahtmatrix
•
0
I have 2 protein sequences I need to compare using NCBI BLAST
- the protein sequence as listed in a gbk file identified by the CDS
the protein sequence that is translated from the nucleotide sequence denoted by the CDS location
import sys import os from Bio import SeqIO from Bio.SeqFeature import SeqFeature, FeatureLocation for record in SeqIO.parse(fullpath, "genbank"): if record.features: for feature in record.features: if feature.type == "CDS": translated_protein = str(feature.qualifiers.get('translation', 'no_translation')).strip('\'[]') cds_to_protein = str(feature.extract(record).seq.translate(to_stop = True)) if translated_protein != cds_to_protein: ##run blast on translated_protein and cds_to_protein
Is that possible with Biopython?
I'm not sure blast is the answer to your problem, which depends on what you want to obtain. But have a look at pairwise alignments in Biopython: http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html
Based on my first impression it looks related to the blast algorithm.
Not an answer and just a comment. Legacy blast had a tool bl2seq which I still use a lot. It just aligns 2 sequences.
Agree with @WouterDeCoster: Doing a simple pairwise comparison can be easily done with the
pairwise2
module of Biopython, like this:The module has several more possibilities to define gap penalties, score matrices etc.