Hi All,
I am trying to run Blast over protein sequences from two organisms. I downloaded the fasta from NCBI. I am trying to iterate over the list of sequences in the fasta file and do a sequence alignment of each sequence in one file with each sequence in the other. I want to run over local Blast but getting some error, I would greatly appreciate some suggestions.
'''from Bio.Blast.Applications import NcbiblastxCommandline
help(NcbiblastxCommandline)'''
from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
from Bio.Blast import NCBIWWW
import cStringIO
def BlastSeq():
SC_Fasta = open("sc.fsa","r")
HS_Fasta = open("hsap.fsa","r")
blastp = "C:\\Program Files\\NCBI\\blast-2.2.29+\\bin\\blastp"
record1 = list(SeqIO.parse(SC_Fasta,"fasta"))
for r1 in record1:
r1.id
r1.seq
record2 = list(SeqIO.parse(HS_Fasta,"fasta"))
for r2 in record2:
r2.id
r2.seq
for r1 in record1:
for r2 in record2:
output = NcbiblastpCommandline(blastp,query= r1.seq, subject=r2.seq, outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))
def main():
BlastSeq()
main()
Error: Bio.Application.ApplicationError: Command 'C:\Program Files\NCBI\blast-2.2.29+\bin\blastp -outfmt 5 -query MVKLTSIAAGVAAIAATASATTTLAQSDERVNLVELGVYVSDIRAHLAQYYMFQAAHPTETYPVEVAEAVFNYGDF -subjectHGLQELKAELDAAVLKATGRQILTLRVRLAGAQLSWLYKEATVQEVDVIPEDGAADVRVIISNSAYGKFRKLFPG' returned non-zero exit status -1073741515
I understand that each seq should be passed as individual fasta file,but I don't understand how to proceed.
Why split the FASTA files? You could try giving BLAST multiple record FASTA files using -query and -subject and it will do pairwise searches (note the e-values will not consider the other sequences, like it would if you used a database search), but the parsing is a little more complicated (loop over the results instead).
Hi Peter, thank you, you are right. I had the idea in mind that the code could be adapted for using needle rather than BLAST as the alignment tool. I have modified my answers as you suggested.
Nice. For EMBOSS needle, try "needle all" if you want to do a comparison of many versus many.