Question

Output FASTA from Biopython blast result

0

Entering edit mode

7.6 years ago

traviata ▴ 20

I'm trying to use the Biopython wrapper for blastp to download matching protein sequences for some sequences that I have stored on my computer. I would like these matching sequences in FASTA format, similar to how on the web server one can select all sequences producing significant alignment and download "FASTA (aligned sequences)". This was my attempt:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

base_dir = "/Users/kjsdhjasv/Documents/cs2017/coevo/"
inputs = ['rnapBeta', 'S3', 'S4', 'S5']
for input in inputs:
    fasta_string = open(base_dir + 'data/e_coli_k12/' + input + '.fa').read()
    out = base_dir + 'results/' + input + '.fa'
    with NCBIWWW.qblast("blastp", "nr", fasta_string) as result_handle:
        with open(out, 'w') as out_file:
            blast_record = NCBIXML.read(result_handle)
            for alignment in blast_record.alignments:
                for hsp in alignment.hsps:
                    out_file.write(alignment.title)
                    out_file.write(hsp.sbjct)

How can I extract a fasta from the blast result?

blastp BioPython fasta Bio.Blast • 4.4k views

ADD COMMENT • link 7.6 years ago by traviata ▴ 20

score 2 · Accepted Answer · 2017-12-02

I believe I found a solution to my own question. The easiest way to do this seems to through SearchIO instead of NCBIXML. Here is the code I ended up using:

from Bio.Blast import NCBIWWW
from Bio import SearchIO
from Bio import SeqIO

base_dir = "/Users/kjsdhjasv/Documents/cs2017/coevo/"
inputs = ['rnapBeta', 'S3', 'S4', 'S5']

for input in inputs:
    blast_xml = base_dir + 'data/blast_xml/' + input + '.xml'
    blast_out = base_dir + 'data/blast_out/' + input + '.fa'

    # run BLAST
    fasta_string = open(base_dir + 'data/e_coli_k12/' + input + '.fa').read()
    with NCBIWWW.qblast("blastp", "nr", fasta_string) as result_handle:
        with open(blast_xml, 'w') as xml_file:
            xml_file.write(result_handle.read())

    # parse xml and write to fasta
    blast_qresult = SearchIO.read(blast_xml, 'blast-xml')
    records = []
    for hit in blast_qresult:
        records.append(hit[0].hit)
    SeqIO.write(records, blast_out, "fasta")