How can I make multiple taxID queries using qblast and store multiple blast handles using NCBIXML?
0
0
Entering edit mode
3.5 years ago

Hello everybody,

I'm new to Biopython, and programming in general, but I am trying to create a small script that iterates through a dictionary, collects each taxID, and, then, searches the protein-seq-file against this taxID-organism. When I try this code (without iteration) using a txid in the entrez_query attribute it usually works, but when using the dictionary like in this script below the final .txt file turns out to be empty.

Does anyone have an idea why? Any help is welcome!

A glimpse of my multiple_aa.faa:

>sp|Q9FW44|ADR1_ARATH Disease resistance protein ADR1 OS=Arabidopsis thaliana OX=3702 GN=ADR1 PE=2 SV=2
MASFIDLFAGDITTQLLKLLALVANTVYSCKGIAERLITMIRDVQPTIREIQYSGAELSN
HHQTQLGVFYEILEKARKLCEKVLRCNRWNLKHVYHANKMKDLEKQISRFLNSQILLFVL
AEVCHLRVNGDRIERNMDRLLTERNDSLSFPETMMEIETVSDPEIQTVLELGKKKVKEMM
FKFTDTHLFGISGMSGSGKTTLAIELSKDDDVRGLFKNKVLFLTVSRSPNFENLESCIRE
FLYDGVHQRKLVILDDVWTRESLDRLMSKIRGSTTLVVSRSKLADPRTTYNVELLKKDEA
MSLLCLCAFEQKSPPSPFNKYLVKQVVDECKGLPLSLKVLGASLKNKPERYWEGVVKRLL
RGEAADETHESRVFAHMEESLENLDPKIRDCFLDMGAFPEDKKIPLDLLTSVWVERHDID
EETAFSFVLRLADKNLLTIVNNPRFGDVHIGYYDVFVTQHDVLRDLALHMSNRVDVNRRE
RLLMPKTEPVLPREWEKNKDEPFDAKIVSLHTGEMDEMNWFDMDLPKAEVLILNFSSDNY
VLPPFIGKMSRLRVLVIINNGMSPARLHGFSIFANLAKLRSLWLKRVHVPELTSCTIPLK
NLHKIHLIFCKVKNSFVQTSFDISKIFPSLSDLTIDHCDDLLELKSIFGITSLNSLSITN
CPRILELPKNLSNVQSLERLRLYACPELISLPVEVCELPCLKYVDISQCVSLVSLPEKFG
KLGSLEKIDMRECSLLGLPSSVAALVSLRHVICDEETSSMWEMVKKVVPELCIEVAKKCF
TVDWLDD

>sp|Q9FKZ1|DRL42_ARATH Probable disease resistance protein At5g66900 OS=Arabidopsis thaliana OX=3702 GN=At5g66900 PE=3 SV=1
MNDWASLGIGSIGEAVFSKLLKVVIDEAKKFKAFKPLSKDLVSTMEILFPLTQKIDSMQK
ELDFGVKELKELRDTIERADVAVRKFPRVKWYEKSKYTRKIERINKDMLKFCQIDLQLLQ
HRNQLTLLGLTGNLVNSVDGLSKRMDLLSVPAPVFRDLCSVPKLDKVIVGLDWPLGELKK
RLLDDSVVTLVVSAPPGCGKTTLVSRLCDDPDIKGKFKHIFFNVVSNTPNFRVIVQNLLQ
HNGYNALTFENDSQAEVGLRKLLEELKENGPILLVLDDVWRGADSFLQKFQIKLPNYKIL
VTSRFDFPSFDSNYRLKPLEDDDARALLIHWASRPCNTSPDEYEDLLQKILKRCNGFPIV
IEVVGVSLKGRSLNTWKGQVESWSEGEKILGKPYPTVLECLQPSFDALDPNLKECFLDMG
SFLEDQKIRASVIIDMWVELYGKGSSILYMYLEDLASQNLLKLVPLGTNEHEDGFYNDFL
VTQHDILRELAICQSEFKENLERKRLNLEILENTFPDWCLNTINASLLSISTDDLFSSKW
LEMDCPNVEALVLNLSSSDYALPSFISGMKKLKVLTITNHGFYPARLSNFSCLSSLPNLK
RIRLEKVSITLLDIPQLQLSSLKKLSLVMCSFGEVFYDTEDIVVSNALSKLQEIDIDYCY
DLDELPYWISEIVSLKTLSITNCNKLSQLPEAIGNLSRLEVLRLCSSMNLSELPEATEGL
SNLRFLDISHCLGLRKLPQEIGKLQNLKKISMRKCSGCELPESVTNLENLEVKCDEETGL
LWERLKPKMRNLRVQEEEIEHNLNLLQMF
dic_tx = {"nicotiana":'"(txid4097[ORGN])"',"grapevine":'"(txid:29760[ORGN])"',"almond":'"(txid:3755[ORGN])"',"apple":'"(txid:3750[ORGN])"',"citrus":'"(txid:2711[ORGN])"',"coffee":'"(txid:13443[ORGN])"', "olive":'"(txid:4146[ORGN])"'}

    for k,v in dic_tx.items():
        print(k)
        print(v)
        Entrez.email = '...@...'
        list_record_host = []
        for record in SeqIO.parse("multiple_aa.faa", format="fasta"):
            print(record.id)
    #         print(record.seq)

            # online request
            try:
                result_handle = NCBIWWW.qblast("blastp","nr", record.format("fasta"),entrez_query=v, hitlist_size=1)
                print(result_handle)
            except HTTPError:
                time.sleep(5)
                result_handle = NCBIWWW.qblast("blastp","nr", record.format("fasta"),entrez_query=v, hitlist_size=1)

            # result handle stored in a list
            list_record_host.append(result_handle)
        result_handle_list_host = open("%s.xml" % k, "w") 
        for item in list_record_host:
            result_handle_list_host.write(item.read())
        result_handle_list_host.close()
    #     result_handle_list_host
        reopen_result_handle = "%s.xml" % k
        blast_records = NCBIXML.parse(open(reopen_result_handle))
        save_file = open("%s_NLR.txt" % k, 'w')
        for blast_record in blast_records:
            for alignment in blast_record.alignments:
                for hsp in alignment.hsps:
                    save_file.write('>%s\n' % (alignment.title,))
          #here possibly to output something to file, between each blast_record
        save_file.close()
NCBIXML qblast python biopython • 662 views
ADD COMMENT

Login before adding your answer.

Traffic: 2717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6