Question

How to parse entries from a text file into a python script?

1

Entering edit mode

5.9 years ago

Ming ▴ 110

Dear All,

I have problem using the input of another file in a python script. I have a text file: taxID.txt that looks like this:

I have about 20,000 entries in this text file.

I also have a script that requires me to input the taxID manually. Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

The python script is below:

 #!/usr/bin/python

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)   
    names = ncbi.get_taxid_translator(lineage)
    lineage2ranks = ncbi.get_rank(names)
    ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
    return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

if __name__ == '__main__':
    taxids = open(‘taxID.txt.txt’,‘r’)
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    results = list()
    for taxid in taxids:
        results.append(list())
        results[-1].append(str(taxid))
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
            else:
                results[-1].append(rank)

    #generate the header
    header = ['Original_query_taxid']
    header.extend(desired_ranks)
    print('\t'.join(header))

    #print the results
    for result in results:
        print('\t'.join(result))

python • 2.1k views

ADD COMMENT • link updated 5.9 years ago by zx8754 12k • written 5.9 years ago by Ming ▴ 110

0

Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

I have problem using the input of another file in a python script.

To me, it is unclear what your problem exactly is, please elaborate.

Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

I don't understand what you mean here, especially the "pipe" part.

For what it's worth, I believe your code could be more efficient. You now add items to the results list without using it afterwards, except for iterating over the list and printing the items. What you could do would be to first print the header, and then rather than adding items to results just print those directly.

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Dear WouterDeCoster,

I am trying to run a python script, and the input to the script requires me to manually key in the entries manually. I happen to have a file that contains approximately 20, 000 entries.

This is what I am supposed to enter manually in the python script:

taxids = [1204725, 2162,  1300163, 420247]

Imagine put 20,000 entries is insane. If I have a file: taxID.txt, how do I put the entries of taxID.txt into the script?

Thank you!

ADD REPLY • link 5.9 years ago by Ming ▴ 110

0

Entering edit mode

Hi Ming,

This is more a pure programming question than bioinformatics. For future reference, questions like that are more appropriate at https://stackoverflow.com/

Cheers,
Wouter

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

You might be interested in: https://github.com/jrjhealey/PYlogeny

Which is a WIP script which does essentially what you're doing.

ADD REPLY • link 5.9 years ago by Joe 22k

0

Entering edit mode

@jrj.healey, thank you and will check it out!

ADD REPLY • link 5.9 years ago by Ming ▴ 110

score 2 · Answer 1 · 2019-04-04

2

Entering edit mode

5.9 years ago

mohammadhassanj ▴ 260

import re
taxids = re.sub("\n"," ",open('taxID.txt.txt','r').read()).split(" ")

ADD COMMENT • link 4.4 years ago by mohammadhassanj ▴ 260

0

Entering edit mode

Thanks @mohammadhassanj,

but I have the following errors:

File "/home/tanshiming/Scripts/python/blast-taxonomy.py", line 17
    taxids = re.sub("\n"," ",open(‘taxID.txt.txt’,‘r’).read()).split(" ")
                                       ^
SyntaxError: invalid character in identifier

ADD REPLY • link 5.9 years ago by Ming ▴ 110

0

Entering edit mode

make sure the quotes are real quotes: ' rather than ‘

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

score 2 · Answer 2 · 2019-04-04

2

Entering edit mode

5.9 years ago

WouterDeCoster 47k

taxids = open('taxID.txt.txt','r') returns a file object which is an iterator. If you iterate over this object you'll get the lines:

for line in taxids:
    print(line)

If you absolutely need these lines in a list (I don't think that's necessary for this script) you can use the readlines() method:

lines = taxids.readlines()

One problem is that each line will still have the newline character \n at the end, so you have to trim that off, e.g. using rstrip.

for taxid in taxids:
    results.append(list())
    results[-1].append(str(taxid.rstrip('\n')))

Note that the above can also be simplified to:

for taxid in taxids:
    results.append([str(taxid.rstrip('\n'))])

ADD COMMENT • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

@ WouterDeCoster, thank you! It worked very well! :)

ADD REPLY • link 5.9 years ago by Ming ▴ 110

0

Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLY • link 5.9 years ago by Devon Ryan 105k

0

Entering edit mode

@Ming: Be aware that in your opening code example, you do not close() the taxids file. Consider switching to using a with open() block instead.

ADD REPLY • link 5.9 years ago by Joe 22k