Parsing Blast Output Biopython Error
1
0
Entering edit mode
13.4 years ago
Ankur ▴ 40

Hi, I have the following code

 def runBLAST(self):
        print "Running BLAST .........."
        cmd=subprocess.Popen("blastp -db nr -query repeat.txt -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 5",shell=True)
        cmd.communicate()[0]
        f1=open("out.faa")
        blast_records = NCBIXML.parse(f1)
        save_file = open("my_fasta_seq.fasta", 'w')
        for blast_record in blast_records[:10]:
            for alignment in blast_record.alignments:
                for hsp in alignment.hsps:
                    save_file.write('>%s\n' % (alignment.hseq,))
        save_file.close()
        f1.close()
        f2=open("my_fasta_seq.fasta")
        for record in SeqIO.parse(f2,"fasta"):
            f=open("tempBLAST1.txt","w")
            f.write(">"+"\n"+strrecord.name)+"\n"+str(record.seq)+"\n")
            f.close()

I get the error on TypeError: for blastrecord in blastrecords[:10]: saying 'generator' object is not subscriptable. I am looking to get top 10 blast hits (sequences)

python biopython blast error ncbi • 4.9k views
ADD COMMENT
4
Entering edit mode
13.4 years ago

This is not a specific BioPython problem, but a general Python question, answered e.g. on StackOverflow. It might be that BioPython only parses the next result on demand, in this case you might be better off with:

for i, blast_record in enumerate(blast_records):
    if i == 10: break
    ...
ADD COMMENT
1
Entering edit mode

It's also a follow-up to the previous question and perhaps should have continued there instead. It's fine to edit your questions and discuss answers in the comments, rather than starting a new question for every variation of the same problem.

ADD REPLY
0
Entering edit mode

As Michael says, blast_records is a generator/iterator. You can loop over it or iterate explicitly by calling next(), but you cannot access records by index. This is a general design pattern for coping with very large files composed of multiple smaller records, also used in the the Biopython SeqIO parse function etc.

ADD REPLY

Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6