Hi, I would like parse the XML output of a local psiblast (NcbipsiblastCommandline wrapper) by putting the stdout from the wrapper into a python string variable and then using NCBIXML.parse to parse the contents of the string variable. Is there any way to do this without getting this error message (and to avoid writing a temporary file):
record = next(psiblast_records)
File "/usr/local/lib/python2.7/dist-packages/Bio/Blast/NCBIXML.py", line 617, in parse
text = handle.read(BLOCK)
AttributeError: 'str' object has no attribute 'read'
Code:
#!/usr/bin/env python
#load modules
from sys import argv
from Bio.Blast.Applications import NcbipsiblastCommandline as psiblast
from Bio.Blast import NCBIXML
from Bio import Entrez
from Bio.Phylo.Applications import FastTreeCommandline as fasttree
#read arguments from command line: 1)amino-acid fasta to build psiblast profile, 2)maximum number of threads for each process
ref_fasta = argv[1]
threads = argv[2]
#use three iterations of psiblast to generate sequence diversity
blast = psiblast(query = ref_fasta, db = 'nr', outfmt = 5, num_alignments = 5000, num_threads = threads)
psiblast_out = blast()[0]
#parse the XML output
psiblast_records = NCBIXML.parse(psiblast_out)
record = next(psiblast_records)
Worked perfectly. After reading your answer and searching for SeqIO in the BioPython tutorial, I found many, many references to the module and how to use it with lots of the wrappers. Code for any other beginners who stumble on this post: