How to parse entries from a text file into a python script?
2
1
Entering edit mode
5.7 years ago
Ming ▴ 110

Dear All,

I have problem using the input of another file in a python script. I have a text file: taxID.txt that looks like this:

28181
1979370
342108
2032654
1437059
1288970
156889
451514
2032646
2032652

I have about 20,000 entries in this text file.

I also have a script that requires me to input the taxID manually. Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

The python script is below:

 #!/usr/bin/python

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)   
    names = ncbi.get_taxid_translator(lineage)
    lineage2ranks = ncbi.get_rank(names)
    ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
    return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

if __name__ == '__main__':
    taxids = open(‘taxID.txt.txt’,‘r’)
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    results = list()
    for taxid in taxids:
        results.append(list())
        results[-1].append(str(taxid))
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
            else:
                results[-1].append(rank)

    #generate the header
    header = ['Original_query_taxid']
    header.extend(desired_ranks)
    print('\t'.join(header))

    #print the results
    for result in results:
        print('\t'.join(result))
python • 2.0k views
ADD COMMENT
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

I have problem using the input of another file in a python script.

To me, it is unclear what your problem exactly is, please elaborate.

Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

I don't understand what you mean here, especially the "pipe" part.

For what it's worth, I believe your code could be more efficient. You now add items to the results list without using it afterwards, except for iterating over the list and printing the items. What you could do would be to first print the header, and then rather than adding items to results just print those directly.

ADD REPLY
0
Entering edit mode

Dear WouterDeCoster,

I am trying to run a python script, and the input to the script requires me to manually key in the entries manually. I happen to have a file that contains approximately 20, 000 entries.

This is what I am supposed to enter manually in the python script:

taxids = [1204725, 2162,  1300163, 420247]

Imagine put 20,000 entries is insane. If I have a file: taxID.txt, how do I put the entries of taxID.txt into the script?

Thank you!

ADD REPLY
0
Entering edit mode

Hi Ming,

This is more a pure programming question than bioinformatics. For future reference, questions like that are more appropriate at https://stackoverflow.com/

Cheers,
Wouter

ADD REPLY
0
Entering edit mode

You might be interested in: https://github.com/jrjhealey/PYlogeny

Which is a WIP script which does essentially what you're doing.

ADD REPLY
0
Entering edit mode

@jrj.healey, thank you and will check it out!

ADD REPLY
2
Entering edit mode
5.7 years ago
import re
taxids = re.sub("\n"," ",open('taxID.txt.txt','r').read()).split(" ")
ADD COMMENT
0
Entering edit mode

Thanks @mohammadhassanj,

but I have the following errors:

File "/home/tanshiming/Scripts/python/blast-taxonomy.py", line 17
    taxids = re.sub("\n"," ",open(‘taxID.txt.txt’,‘r’).read()).split(" ")
                                       ^
SyntaxError: invalid character in identifier
ADD REPLY
0
Entering edit mode

make sure the quotes are real quotes: ' rather than

ADD REPLY
2
Entering edit mode
5.7 years ago

taxids = open('taxID.txt.txt','r') returns a file object which is an iterator. If you iterate over this object you'll get the lines:

for line in taxids:
    print(line)

If you absolutely need these lines in a list (I don't think that's necessary for this script) you can use the readlines() method:

lines = taxids.readlines()

One problem is that each line will still have the newline character \n at the end, so you have to trim that off, e.g. using rstrip.

for taxid in taxids:
    results.append(list())
    results[-1].append(str(taxid.rstrip('\n')))

Note that the above can also be simplified to:

for taxid in taxids:
    results.append([str(taxid.rstrip('\n'))])
ADD COMMENT
0
Entering edit mode

@ WouterDeCoster, thank you! It worked very well! :)

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

@Ming: Be aware that in your opening code example, you do not close() the taxids file. Consider switching to using a with open() block instead.

ADD REPLY

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6