Dear All,
I have problem using the input of another file in a python script. I have a text file: taxID.txt that looks like this:
28181
1979370
342108
2032654
1437059
1288970
156889
451514
2032646
2032652
I have about 20,000 entries in this text file.
I also have a script that requires me to input the taxID manually. Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?
The python script is below:
#!/usr/bin/python
import csv
from ete3 import NCBITaxa
ncbi = NCBITaxa()
def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
names = ncbi.get_taxid_translator(lineage)
lineage2ranks = ncbi.get_rank(names)
ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}
if __name__ == '__main__':
taxids = open(‘taxID.txt.txt’,‘r’)
desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
results = list()
for taxid in taxids:
results.append(list())
results[-1].append(str(taxid))
ranks = get_desired_ranks(taxid, desired_ranks)
for key, rank in ranks.items():
if rank != '<not present>':
results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
else:
results[-1].append(rank)
#generate the header
header = ['Original_query_taxid']
header.extend(desired_ranks)
print('\t'.join(header))
#print the results
for result in results:
print('\t'.join(result))
I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
To me, it is unclear what your problem exactly is, please elaborate.
I don't understand what you mean here, especially the "pipe" part.
For what it's worth, I believe your code could be more efficient. You now add items to the
results
list without using it afterwards, except for iterating over the list and printing the items. What you could do would be to first print the header, and then rather than adding items toresults
just print those directly.Dear WouterDeCoster,
I am trying to run a python script, and the input to the script requires me to manually key in the entries manually. I happen to have a file that contains approximately 20, 000 entries.
This is what I am supposed to enter manually in the python script:
Imagine put 20,000 entries is insane. If I have a file: taxID.txt, how do I put the entries of taxID.txt into the script?
Thank you!
Hi Ming,
This is more a pure programming question than bioinformatics. For future reference, questions like that are more appropriate at https://stackoverflow.com/
Cheers,
Wouter
You might be interested in: https://github.com/jrjhealey/PYlogeny
Which is a WIP script which does essentially what you're doing.
@jrj.healey, thank you and will check it out!