Entering edit mode
3.5 years ago
oseias.rf.junior
•
0
Hello everybody,
I'm new to Biopython, and programming in general, but I am trying to create a small script that iterates through a dictionary, collects each taxID, and, then, searches the protein-seq-file against this taxID-organism. When I try this code (without iteration) using a txid in the entrez_query attribute it usually works, but when using the dictionary like in this script below the final .txt file turns out to be empty.
Does anyone have an idea why? Any help is welcome!
A glimpse of my multiple_aa.faa
:
>sp|Q9FW44|ADR1_ARATH Disease resistance protein ADR1 OS=Arabidopsis thaliana OX=3702 GN=ADR1 PE=2 SV=2
MASFIDLFAGDITTQLLKLLALVANTVYSCKGIAERLITMIRDVQPTIREIQYSGAELSN
HHQTQLGVFYEILEKARKLCEKVLRCNRWNLKHVYHANKMKDLEKQISRFLNSQILLFVL
AEVCHLRVNGDRIERNMDRLLTERNDSLSFPETMMEIETVSDPEIQTVLELGKKKVKEMM
FKFTDTHLFGISGMSGSGKTTLAIELSKDDDVRGLFKNKVLFLTVSRSPNFENLESCIRE
FLYDGVHQRKLVILDDVWTRESLDRLMSKIRGSTTLVVSRSKLADPRTTYNVELLKKDEA
MSLLCLCAFEQKSPPSPFNKYLVKQVVDECKGLPLSLKVLGASLKNKPERYWEGVVKRLL
RGEAADETHESRVFAHMEESLENLDPKIRDCFLDMGAFPEDKKIPLDLLTSVWVERHDID
EETAFSFVLRLADKNLLTIVNNPRFGDVHIGYYDVFVTQHDVLRDLALHMSNRVDVNRRE
RLLMPKTEPVLPREWEKNKDEPFDAKIVSLHTGEMDEMNWFDMDLPKAEVLILNFSSDNY
VLPPFIGKMSRLRVLVIINNGMSPARLHGFSIFANLAKLRSLWLKRVHVPELTSCTIPLK
NLHKIHLIFCKVKNSFVQTSFDISKIFPSLSDLTIDHCDDLLELKSIFGITSLNSLSITN
CPRILELPKNLSNVQSLERLRLYACPELISLPVEVCELPCLKYVDISQCVSLVSLPEKFG
KLGSLEKIDMRECSLLGLPSSVAALVSLRHVICDEETSSMWEMVKKVVPELCIEVAKKCF
TVDWLDD
>sp|Q9FKZ1|DRL42_ARATH Probable disease resistance protein At5g66900 OS=Arabidopsis thaliana OX=3702 GN=At5g66900 PE=3 SV=1
MNDWASLGIGSIGEAVFSKLLKVVIDEAKKFKAFKPLSKDLVSTMEILFPLTQKIDSMQK
ELDFGVKELKELRDTIERADVAVRKFPRVKWYEKSKYTRKIERINKDMLKFCQIDLQLLQ
HRNQLTLLGLTGNLVNSVDGLSKRMDLLSVPAPVFRDLCSVPKLDKVIVGLDWPLGELKK
RLLDDSVVTLVVSAPPGCGKTTLVSRLCDDPDIKGKFKHIFFNVVSNTPNFRVIVQNLLQ
HNGYNALTFENDSQAEVGLRKLLEELKENGPILLVLDDVWRGADSFLQKFQIKLPNYKIL
VTSRFDFPSFDSNYRLKPLEDDDARALLIHWASRPCNTSPDEYEDLLQKILKRCNGFPIV
IEVVGVSLKGRSLNTWKGQVESWSEGEKILGKPYPTVLECLQPSFDALDPNLKECFLDMG
SFLEDQKIRASVIIDMWVELYGKGSSILYMYLEDLASQNLLKLVPLGTNEHEDGFYNDFL
VTQHDILRELAICQSEFKENLERKRLNLEILENTFPDWCLNTINASLLSISTDDLFSSKW
LEMDCPNVEALVLNLSSSDYALPSFISGMKKLKVLTITNHGFYPARLSNFSCLSSLPNLK
RIRLEKVSITLLDIPQLQLSSLKKLSLVMCSFGEVFYDTEDIVVSNALSKLQEIDIDYCY
DLDELPYWISEIVSLKTLSITNCNKLSQLPEAIGNLSRLEVLRLCSSMNLSELPEATEGL
SNLRFLDISHCLGLRKLPQEIGKLQNLKKISMRKCSGCELPESVTNLENLEVKCDEETGL
LWERLKPKMRNLRVQEEEIEHNLNLLQMF
dic_tx = {"nicotiana":'"(txid4097[ORGN])"',"grapevine":'"(txid:29760[ORGN])"',"almond":'"(txid:3755[ORGN])"',"apple":'"(txid:3750[ORGN])"',"citrus":'"(txid:2711[ORGN])"',"coffee":'"(txid:13443[ORGN])"', "olive":'"(txid:4146[ORGN])"'}
for k,v in dic_tx.items():
print(k)
print(v)
Entrez.email = '...@...'
list_record_host = []
for record in SeqIO.parse("multiple_aa.faa", format="fasta"):
print(record.id)
# print(record.seq)
# online request
try:
result_handle = NCBIWWW.qblast("blastp","nr", record.format("fasta"),entrez_query=v, hitlist_size=1)
print(result_handle)
except HTTPError:
time.sleep(5)
result_handle = NCBIWWW.qblast("blastp","nr", record.format("fasta"),entrez_query=v, hitlist_size=1)
# result handle stored in a list
list_record_host.append(result_handle)
result_handle_list_host = open("%s.xml" % k, "w")
for item in list_record_host:
result_handle_list_host.write(item.read())
result_handle_list_host.close()
# result_handle_list_host
reopen_result_handle = "%s.xml" % k
blast_records = NCBIXML.parse(open(reopen_result_handle))
save_file = open("%s_NLR.txt" % k, 'w')
for blast_record in blast_records:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.title,))
#here possibly to output something to file, between each blast_record
save_file.close()