Entering edit mode
2.0 years ago
mxlsherry1992
▴
80
Dear all,
When I run a Python script, it reported an error like this:
Traceback (most recent call last):
File "replace_gff_gene_id.py", line 7, in <module>
ab[lst1[0]] = lst1[1]
IndexError: list index out of range
The whole script is like this:
python replace_gff_gene_id.py Prunella_fulvescens_BGI_coress.seqid Prunella_fulvescens.homolog.gff > Prunella_fulvescens.modify.gff3
Also attached script replace_gff_gene_id.py
here:
import sys
ab = {}
with open(sys.argv[1]) as rename_file:
for row in rename_file:
lst1 = row.strip().split('\t')
ab[lst1[0]] = lst1[1]
#print(ab)
with open(sys.argv[2]) as gff_file:
for row in gff_file:
for key,value in ab.items():
row = row.replace(key,value).strip()
print(row)
Here is an example for input file 1, Prunella_fulvescens_BGI_coress.seqid
:
PRUFUL_R14685 Prunella_himalayana_BGI_1
PRUFUL_R05501 Prunella_himalayana_BGI_2
PRUFUL_R10205 Prunella_himalayana_BGI_3
PRUFUL_R07295 Prunella_himalayana_BGI_4
PRUFUL_R07296 Prunella_himalayana_BGI_5
PRUFUL_R10726 Prunella_himalayana_BGI_6
PRUFUL_R13095 Prunella_himalayana_BGI_7
PRUFUL_R13096 Prunella_himalayana_BGI_8
PRUFUL_R14411 Prunella_himalayana_BGI_9
PRUFUL_R07297 Prunella_himalayana_BGI_10
Here is an example for input file 2, Prunella_fulvescens.homolog.gff
:
scaffold9610 GeneWise mRNA 732 962 54.88 - . ID=PRUFUL_R00001;Source=ENSTGUP00000017881-D17;Shift=0;MidStop=0;
scaffold9610 GeneWise CDS 732 962 . - 0 Parent=PRUFUL_R00001;Source=ENSTGUP00000017881-D17;
scaffold9610 GeneWise mRNA 2503 2764 74.71 - . ID=PRUFUL_R00002;Source=ENSTGUP00000018017-D16;Shift=1;2653-2656;MidStop=0;
scaffold9610 GeneWise CDS 2503 2652 . - 0 Parent=PRUFUL_R00002;Source=ENSTGUP00000018017-D16;
scaffold9610 GeneWise CDS 2657 2764 . - 0 Parent=PRUFUL_R00002;Source=ENSTGUP00000018017-D16;
scaffold9610 GeneWise mRNA 2081 2496 63.16 - . ID=PRUFUL_R00003;Source=ENSTGUP00000018035-D83;Shift=0;MidStop=0;
scaffold9610 GeneWise CDS 2081 2195 . - 1 Parent=PRUFUL_R00003;Source=ENSTGUP00000018035-D83;
scaffold9610 GeneWise CDS 2246 2496 . - 0 Parent=PRUFUL_R00003;Source=ENSTGUP00000018035-D83;
I would be very appreciated if you could help fixing this issue
Impossible to answer without posting at least an example of the input file.
Thanks for the reply! I updated and posted the examples for input files~
Double check that tab is actually the field separator for the input file. List goes out of range because splitting by tab results in a single field (only index 0). Provided code has no problem at all.