Can Bio.Blast.NCBIXML somehow parse the stdout from NcbipsiblastCommandline so that no .xml file is created?
2
0
Entering edit mode
10.3 years ago

Hi, I would like parse the XML output of a local psiblast (NcbipsiblastCommandline wrapper) by putting the stdout from the wrapper into a python string variable and then using NCBIXML.parse to parse the contents of the string variable. Is there any way to do this without getting this error message (and to avoid writing a temporary file):

record = next(psiblast_records)
  File "/usr/local/lib/python2.7/dist-packages/Bio/Blast/NCBIXML.py", line 617, in parse
    text = handle.read(BLOCK)
AttributeError: 'str' object has no attribute 'read'

Code:

#!/usr/bin/env python

#load modules
from sys import argv

from Bio.Blast.Applications import NcbipsiblastCommandline as psiblast

from Bio.Blast import NCBIXML
from Bio import Entrez

from Bio.Phylo.Applications import FastTreeCommandline as fasttree

#read arguments from command line: 1)amino-acid fasta to build psiblast profile, 2)maximum number of threads for each process
ref_fasta = argv[1]
threads = argv[2]

#use three iterations of psiblast to generate sequence diversity
blast = psiblast(query = ref_fasta, db = 'nr', outfmt = 5, num_alignments = 5000, num_threads = threads)

psiblast_out = blast()[0]

#parse the XML output
psiblast_records = NCBIXML.parse(psiblast_out)

record = next(psiblast_records)
NCBIXML blast biopython • 5.8k views
ADD COMMENT
5
Entering edit mode
10.3 years ago
lelle ▴ 830

You can use the StringIO module to make an object that behaves like a file handle and can be passed to the parse function.

ADD COMMENT
0
Entering edit mode

Worked perfectly. After reading your answer and searching for SeqIO in the BioPython tutorial, I found many, many references to the module and how to use it with lots of the wrappers. Code for any other beginners who stumble on this post: ​

from cStringIO import StringIO

...

blast = psiblast(query = ref_fasta, db = 'nr', outfmt = 5, num_alignments = 5000, num_threads = threads)

psiblast_stdout = blast()[0]

#parse the XML output
psiblast_xml = StringIO(psiblast_stdout)

psiblast_records = NCBIXML.parse(psiblast_xml)
ADD REPLY
1
Entering edit mode
8.8 years ago
Peter 6.0k

First use out="-" (this is the default) when building the BLAST+ command line. Rather than making a file named hyphen will write the output to stdout. Second, you will need to call the command line string from Biopython using the subprocess module. There is a related example using MUSCLE in the Biopython Tutorial - search for "MUSCLE using stdout" in http://biopython.org/DIST/docs/tutorial/Tutorial.html

The answer from Lelle using StringIO would also work, and would be quite simple and reliable BUT this will load the entire XML file into memory as a string. That can be a problem for some datasets.

ADD COMMENT

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6