Output FASTA from Biopython blast result
1
0
Entering edit mode
7.0 years ago
traviata ▴ 20

I'm trying to use the Biopython wrapper for blastp to download matching protein sequences for some sequences that I have stored on my computer. I would like these matching sequences in FASTA format, similar to how on the web server one can select all sequences producing significant alignment and download "FASTA (aligned sequences)". This was my attempt:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

base_dir = "/Users/kjsdhjasv/Documents/cs2017/coevo/"
inputs = ['rnapBeta', 'S3', 'S4', 'S5']
for input in inputs:
    fasta_string = open(base_dir + 'data/e_coli_k12/' + input + '.fa').read()
    out = base_dir + 'results/' + input + '.fa'
    with NCBIWWW.qblast("blastp", "nr", fasta_string) as result_handle:
        with open(out, 'w') as out_file:
            blast_record = NCBIXML.read(result_handle)
            for alignment in blast_record.alignments:
                for hsp in alignment.hsps:
                    out_file.write(alignment.title)
                    out_file.write(hsp.sbjct)

How can I extract a fasta from the blast result?

blastp BioPython fasta Bio.Blast • 4.2k views
ADD COMMENT
2
Entering edit mode
7.0 years ago
traviata ▴ 20

I believe I found a solution to my own question. The easiest way to do this seems to through SearchIO instead of NCBIXML. Here is the code I ended up using:

from Bio.Blast import NCBIWWW
from Bio import SearchIO
from Bio import SeqIO

base_dir = "/Users/kjsdhjasv/Documents/cs2017/coevo/"
inputs = ['rnapBeta', 'S3', 'S4', 'S5']

for input in inputs:
    blast_xml = base_dir + 'data/blast_xml/' + input + '.xml'
    blast_out = base_dir + 'data/blast_out/' + input + '.fa'

    # run BLAST
    fasta_string = open(base_dir + 'data/e_coli_k12/' + input + '.fa').read()
    with NCBIWWW.qblast("blastp", "nr", fasta_string) as result_handle:
        with open(blast_xml, 'w') as xml_file:
            xml_file.write(result_handle.read())

    # parse xml and write to fasta
    blast_qresult = SearchIO.read(blast_xml, 'blast-xml')
    records = []
    for hit in blast_qresult:
        records.append(hit[0].hit)
    SeqIO.write(records, blast_out, "fasta")
ADD COMMENT

Login before adding your answer.

Traffic: 1809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6