blast2gff IndexError: list index out of range
1
0
Entering edit mode
4.3 years ago

Hello everyone,

I have an issue with the mgkit tool: blast2gff. I am trying to output a GFF file from a BLAST result (with the -outfmt 6 format). Here are the command, then the error message:

blast2gff blastdb output_blast.out output_blast.gff

INFO - mgkit.workflow.blast2gff: Writing to file (output_blast.gff)
INFO - mgkit.io.blast: Reading BLAST results from file (output_blast.out)
Traceback (most recent call last):
  File "/cluster/home/usr/.conda/envs/mgkit/bin/blast2gff", line 11, in <module>
    sys.exit(main())
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/workflow/blast2gff.py", line 207, in convert_from_blastdb
    for annotation in iterator:
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/io/blast.py", line 200, in parse_uniprot_blast
    value_funcs=value_funcs):
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/io/blast.py", line 136, in parse_blast_tab
    for index, func in zip(ret_col, value_funcs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/io/blast.py", line 136, in <genexpr>
    for index, func in zip(ret_col, value_funcs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/workflow/blast2gff.py", line 192, in name_func
    return x.split(header_sep)[gene_index]
IndexError: list index out of range

Would anyone have encounter the same issue?

blast2gff GFF BLAST Python blastn • 2.7k views
ADD COMMENT
0
Entering edit mode

Thanks for your answer zorbax! I am using the 6 output format, and the default outputs are as following:

ACmerged_contig_1002    Scaffolds_2465_pilon    91.255  263 21  2   551 811 43629   43367   1.05e-96    357

Following your advice, I have tried the two following formats:

ACmerged_contig_1002|Scaffolds_2465_pilon   91.255  263 21  2   551 811 43629   43367   1.05e-96    357

ACmerged_contig_1002|Scaffolds_2465_pilon|91.255|263|21|2|551|811|43629|43367|1.05e-96|357

And it gives the same error message. Is it the format you were showing to me, or did I miss someting?

ADD REPLY
1
Entering edit mode

Issue is likely with Scaffolds_2465_pilon field. As shown below NCBI's fasta headers have identifiers separated by a | character e.g. sp|O14830|PPE2_HUMAN. @zorbax is referring to just second field (hit) in blast output.

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

It does work! Thanks genomax. Apologizes for the incorrect use of the Answer button.

ADD REPLY
0
Entering edit mode

Great! You can accept @zorbax's answer (green check mark) to provide closure to the thread.

ADD REPLY
0
Entering edit mode

To note: there is actually an option to input a different separator for the header. https://mgkit.readthedocs.io/en/0.4.1/scripts/blast2gff.html#cmdoption-blast2gff-blastdb-s

But trying this with tab is not working, for some reasons...

blast2gff blastdb --header-sep '\t' output_blast.out output_blast.gff
ADD REPLY
0
Entering edit mode

I am having same issue. I even removed lines with '|' in names, still the error is same

a few lines from my blast file:

P05522.1        tig00054011_1   55.195  154     30      2       26      141     4514526 4514984 5.85e-41        161
P05522.1        tig00000001_1   100.000 111     0       0       384     494     3401148 3400816 8.57e-78        204
P05522.1        tig00000001_1   100.000 56      0       0       330     385     3401396 3401229 8.57e-78        110

Please help!!

ADD REPLY
0
Entering edit mode

The IndexError is raised when attempting to retrieve an index from a sequence (e.g. list, tuple), and the index isn’t found in the sequence. The Python documentation defines when this exception is raised:

Raised when a sequence subscript is out of range. (Source)

Here’s an Python Split() example that raises the IndexError:

data = "one%two%three%four%five" numbers = data.split('%')

The list numbers has 5 elements, and the indexing starts with 0, so, the last element will have index 4. If you try to subscript with an index higher than 4, the Python Interpreter will raise an IndexError since there is no element at such index.

ADD REPLY
2
Entering edit mode
4.3 years ago
zorbax ▴ 650

check the headers of your sequences, subject or query, the default separator should be |. I tried with a blast output format 6 with the following format and it works:

c0_seq1  sp|O14830|PPE2_HUMAN 68.421 76 24 0 229 2 437 512 1.86e-34 126
ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6