How To Parse Psiblast Results Using Biopython And Blast-2.2.24+?
1
4
Entering edit mode
14.2 years ago
Niek De Klein ★ 2.6k

I'm trying to run a PSIBlast program which selects certain sequences out at every round before it does the next iteration. For this I need the Round attribute shown in http://biopython.org/DIST/docs/tutorial/Tutorial.html#fig:psiblastrecord.<br< a="">> The biopython tutorial says: "In Biopython, the parsers return Record objects, either Blast or PSIBlast depending on what you are parsing." However, I can only access attributes found in the normal Blast record class: http://biopython.org/DIST/docs/tutorial/Tutorial.html#fig:blastrecord.

from Bio.Blast.Applications import NcbipsiblastCommandline from Bio import SeqIO from Bio.Blast import NCBIXML

File = "KNATM"

def psiBlast(File):
    fastaFile = open("BLAST-"+File+".txt","r")

    my_blast_db = r"C:\Niek\Test2.2.17\TAIR9_pep_20090619.fasta"
    my_blast_file = '"C:\\Niek\\Evolution MiP\\BLAST-'+File+'.txt"'
    my_blast_exe = r"C:\Niek\blast-2.2.24+\bin\psiblast.exe"    
    E_VALUE_TRESH = "10"

    for seq_record in SeqIO.parse("BLAST-"+File+".txt", "fasta"):
        global cline
        tempFile = open("tempFile.fasta","w")
        tempFile.write(">"+strseq_record.id)+"\n"+str(seq_record.seq)+"\n")
        tempFile.close()
        cline = NcbipsiblastCommandline(cmd = my_blast_exe, db = my_blast_db, \
                query = "tempFile.fasta", evalue = E_VALUE_TRESH, outfmt = 5, \
                out = "1stIte"+File+".xml", out_pssm = "pssm"+File)
        cline()
        result_handle = open("1stIte"+File+".xml","r")

        blast_record = NCBIXML.read(result_handle)
        for alignment in blast_record.alignments:
            for alignment in blast_record.alignments:
                print alignment.title
                for hsp in alignment.hsps:
                    print hsp.expect

psiBlast(File)

So this works, but if I change

blast_record = NCBIXML.read(result_handle)
for Round in blast_record.rounds:
    etc...

I get

Traceback (most recent call last):
  File "C:\Niek\Evolution MiP\psiBLAST1.1.py", line 45, in <module>
    psiBlast(File)
  File "C:\Niek\Evolution MiP\psiBLAST1.1.py", line 36, in psiBlast
    for Round in blast_record.rounds:
AttributeError: Blast instance has no attribute 'rounds

So how do you have to blast/parse to get the PSIBLAST record?

Thanks, Niek

python biopython blast • 6.8k views
ADD COMMENT
0
Entering edit mode

What version of BioPython are you using? I think the psiblast_record object is newer than NcbipsiblastCommandline itself.

ADD REPLY
3
Entering edit mode
14.2 years ago

Not for python and not for PSI-BLAST but my answer might help: instead of trying to parse your XML with Python, how about parsing it with XML+XSLT ? For example, I wrote the stylesheet below to generate an input for mongodb from BLAST:

Result:

xsltproc --novalid stylesheet.xsl ~/blast.xml

use blastdb;

query={
      def:"No definition line",
      len:670
,param: {
        expect:10,
        sc_match:2,
        sc_mismatch:-3,
        gap_open:5,
        gap_extend:2,
        filter:"L;m;",
        }

      };
db.queries.save(query);

hit={
query_id: query._id,
num:1,

id:"gi|118082669|ref|XM_416233.2|",

gi:118082669,

def:"PREDICTED: Gallus gallus similar to ubiquitous tetratricopeptide containing protein RoXaN; Rotavirus X asso
ciated non-structural protein (LOC417996), mRNA",
acn:"XM_416233",
len: 2868,
hsp:[
{
num:1,
bit_score:544.1,
score:602,
evalue:3.34611e-151,
query_from:92,
query_to:395,
hit_from:2378,
hit_to:2681,
query_frame:1,
hit_frame:1,
identity:303,
positive:303,
gaps:0,
align_len:304,
qseq:"TACTAGATATGCAGCAGACCTATGACATGTGGCTAAAGAAACACAATCCTGGGAAGCCTGGAGAGGGAACACCACTCACTTCGCGAGAAGGGGAGAAACAGATCCA
GATGCCCACTGACTATGCTGACATCATGATGGGCTACCACTGCTGGCTCTGCGGGAAGAACAGCAACAGCAAGAAGCAATGGCAGCAGCACATCCAGTCAGAGAAGCACAAG
GAGAAGGTCTTCACCTCAGACAGTGACTCCAGCTGCTGGAGCTATCGCTTCCCTATGGGCGAGTTCCAGCTCTGTGAAAGGTACCA",
midline:"|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||",
hseq:"TACTAGATATGCAGCAGACCTATGACATGTGGCTAAAGAAACACAATCCTGGGAAGCCTGGAGAGGGAACACCACTCACTTCGCGAGAAGGGGAGAAACAGATCCA
GATGCCCACTGACTATGCTGACATCATGATGGGCTACCACTGCTGGCTCTGCGGGAAGAACAGCAACAGCAAGAAGCAATGGCAGCAGCACATCCAGTCAGAGAAGCACAAG
GAGAAGGTCTTCACCTCAGACAGTGACTCCAGCTGCTGGAGCTATCGCTTCCCTATGGGCGAGTTCCAGCTCTGTGAAAGGTTCCA"
}
(...)
ADD COMMENT
0
Entering edit mode

Well I've never worked with XSLT so it would be easier if someone knows how to parse it right using biopython + it all has to be done automatically, but I'll keep it in mind. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6