Arranging header in nr database via Python (Unable to input multiple files)
2
0
Entering edit mode
7.8 years ago
AsoInfo ▴ 300

Hi,

I want to take input file after file. But for some reasons, it just take input of only one file not the other files. (sys.argv is not working)

A small example how I am using this code:

import sys
from sys import argv

with open(sys.argv[1], 'r') as f:
    contents = f.read()
print (contents)

Even this code is not working.

The code is as follows:

 #!/usr/bin/env
import collections
data = collections.defaultdict(list)
import sys
from sys import argv

file=open("../names.dmp","r")
#original=open("test.txt","r")
write=open("write.txt","w")
missing=open("missing.txt","w")

for line in file:
    columns = line.strip().split("|")
    data[columns[0].strip()].append(columns[1].replace("\t", ""))

tax=0
name=""
none=0
count=1



with open(sys.argv[1], 'r') as original:
    for line in original:
        line=original.readline()
        if line.startswith(">"):
            try:
                chunk=line.rsplit('[', 1)[1]
                index = chunk[:-2]
                if len(index) <= 50:
                    for k,v in data.items():
                        if index in v:
                            tax=k
                            break
                        else:
                            continue
                    if tax != 0:
                        lne=original.readline()
                        for pine in original:
                            if pine.startswith("\n"):
                                break
                            else:
                                lne=lne+pine
                        finalline=line[:1] + tax + "|"+"gi_"+line[1:]+lne+"\n"
                        write.write(finalline)
                        tax=0
                else:
                    lne=original.readline()
                    for pine in original:
                        if pine.startswith("\n"):
                            break
                        else:
                            lne=lne+pine
                    missing.write(line[:1] + str(none) + "|"+"gi_"+line[1:]+lne+"\n")
                    tax=0
            except:
                lne=original.readline()
                for pine in original:
                    if pine.startswith("\n"):
                        break
                    else:
                        lne=lne+pine
                missing.write(line[:1] + str(none) + "|"+"gi_"+line[1:]+lne+"\n")
                tax=0

the code works fine when given input as a single file but it does not work if I use sys.argv.

Thanks!

Python nr database sys.argv • 1.7k views
ADD COMMENT
3
Entering edit mode
7.8 years ago

I'm not entirely sure what you are trying to do, and for sure your code could use some improvements.

import sys
from sys import argv

It's pointless to additionally import argv separately. You can just import sys and use sys.argv

file=open("../names.dmp","r")

file is not a good variable name in python, since it's a reserved keyword.

I see you are manually parsing a fasta file, which is generally not advisable. I would suggest to use BioPython SeqIO for this job. Will make your code easier to read and less room for errors.

the code works fine when given input as a single file but it does not work if I use sys.argv.

It's (to me) unclear how you use sys.argv and what exactly doesn't work.

ADD COMMENT
3
Entering edit mode
7.8 years ago

In addition to @wouter's remarks:

1) The shebang #!/usr/bin/env should probably be #!/usr/bin/env python.

2) write=open("write.txt","w") is a poor choice, because write is a function to write to output. Same remark as for the usage of file. A safe way to use filehandles is to make the caps lock (e.g. OUTPUT=open("filename", "w")).

3) line.strip().split("|") is totally fine but if your task is to remove the whitespaces/newlines at the end of the line, then use rstrip() instead. See https://docs.python.org/2/library/stdtypes.html#str.rstrip

4) line=original.readline() is redundant: when you do for line in original you are already doing readline() on the implicit iterator.

5) write.write(finalline) really?

6) except should be used coupled with the type of exception, like for example except KeyError, otherwise you handle each exception the same way and perhaps you don't want that.

ADD COMMENT
0
Entering edit mode

Thanks for the comments!

I edited my script already based on the comments and now the script is working fine.

ADD REPLY
0
Entering edit mode

You can mark the comments as answers if they helped you out, so that other readers will see this thread as solved!

ADD REPLY
0
Entering edit mode

I moved our comments to answers, so they can get accepted.

ADD REPLY

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6