Biopython includes the Bio.Blast.Applications module which provides wrappers for the NCBI blast tools. The Biopython Tutorial provides some examples on how to use them, and the BLAST chapter of the Biopython manual has more information. If you don't specify an output file, the alignment output will come back to you as a string in the standard output of the program. You should specify outfmt=5
as one of the arguments so that your command returns XML output, which Biopython can parse (see below) You can use Python's StringIO module to convert the string to a filehandle, which you can parse using Bio.Blast.NCBIXML (alternatively, you can just specify out="blast_results.xml"
in the command to write the output to a file and then use open("blast_results.xml", "r")
to get a filehandle for reading the output). Again, see the BLAST chapter of the Biopython manual for examples of this.
#!/usr/bin/env python
from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
# Create two sequence files
seq1 = SeqRecord(Seq("FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF"),
id="seq1")
seq2 = SeqRecord(Seq("FQTWEEFSRAEKLYLADPMKVRVVLRYRHVDGNLCIKVTDDLICLVYRTDQAQDVKKIEKF"),
id="seq2")
SeqIO.write(seq1, "seq1.fasta", "fasta")
SeqIO.write(seq2, "seq2.fasta", "fasta")
# Run BLAST and parse the output as XML
output = NcbiblastpCommandline(query="seq1.fasta", subject="seq2.fasta", outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))
# Print some information on the result
for alignment in blast_result_record.alignments:
for hsp in alignment.hsps:
print '****Alignment****'
print 'sequence:', alignment.title
print 'length:', alignment.length
print 'e value:', hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
For doing a full Smith-Waterman alignment, you can use Bio.Emboss.Applications.WaterCommandline to run the alignment, and use Bio.AlignIO to read the result. Or, as mentioned in another answer, you can use the use_sw_tback=True
option for blastp.
EDIT: I forgot to mention that the code here uses the NCBI Blast+ suite, which has separate commands for blastn, blastp, and son on, and not the old all-in-one blastall program.
when I am working on the above code I got this error
Can anyone please help me?