In this script I use genode GTF for mouse (last realase vM18)
At the beginning, I wanted to modify the mouse GTF annotation using a python script (modify some positions in chr12 annotation)
Of course I simplified my script to focus on my issue. I'll try to make it clear.
This is my script :
import os
import sys
import getopt
import os.path
import numpy as np
import pandas as pd
def usage():
print('Usage:\n')
print('\tpython ' +
sys.argv[0] + ' -g <genome type> -y <gtf file>')
print('\t\t-h or --help : display this help')
print('\t\t-g or --genome : will add the genome to the cytoband file name')
print('\t\t-y or --file_gtf : gtf file')
def main(argv):
genome = ""
file_gtf = ""
try:
opts, args = getopt.getopt(sys.argv[1:], 'g:y:', ['genome=', 'file_gtf=', 'help'])
except getopt.GetoptError:
usage()
sys.exit(2)
##############################OPTIONS##############################
for opt, arg in opts:
if opt in ('-h', '--help'):
usage()
sys.exit(2)
elif opt in ('-g', '--genome'):
genome = arg
elif opt in ('-y', '--file_gtf'):
file_gtf = arg
else:
print("Error : Bad option -> " + opt)
usage()
sys.exit(2)
##############################PROGRAM##############################
new_vcf_file = open(genome + ".gtf", "a")
with open(file_gtf, 'r') as vcf_f:
for line in vcf_f:
if line[0] != '#':
array_line = line.split("\t")
if array_line[0] == "chr12":
new_vcf_file.write(line)
else:
new_vcf_file.write(line)
else:
new_vcf_file.write(line)
new_vcf_file.close()
if __name__ == '__main__':
main(sys.argv[1:])
I know it does nothing because I simplified it. Just read and write lines.
I run my script with this command line :
python test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
At the end of this command a new GTF file is generated (testBiostar.gtf). To show my issue I count the number of line of this file and I delete it to re-run easily.
In this script I do not delete lines from the gencode GTF, so I expect to have the same number of lines
17:32:15 [ubuntu16@ubuntu16-VirtualBox Data]$ wc -l gencode.vM18.chr_patch_hapl_scaff.annotation.gtf
1818071 gencode.vM18.chr_patch_hapl_scaff.annotation.gtf
Then I run my script multiple times :
17:31:21 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1754744 testBiostar.gtf
17:31:47 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1754735 testBiostar.gtf
Everytime I launch the script I got different results (sometimes I got a result I already got...)
EDIT : While typing I found the answer but I don't know why (computer science as its finest)
I opened the testBiostar.gtf result file and I found that some lines were cut and glue to others. My hints were on the read/write process... I do not know where exactly, but I replaced :
new_vcf_file = open(genome + ".gtf", "a")
To :
new_vcf_file = open(genome + ".gtf", "w")
17:54:10 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1818071 testBiostar.gtf
17:55:10 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1818071 testBiostar.gtf
My question is why it is working now ? I know the 'a' is for 'append to the end of the file' but after each run I delete this file
I bet it is a stupid answer... but I can not find it
Thanks
This have no relation to the answer, why you do not use python
argparse
? this will save you theusage()
function and help more in processing input args.Habits mostly, I learned python with
getopt
so I keep using it. I will give a try toargparse
thanksYou will not regret :D
I stopped counting how many time this happens to me ;)