Hi everybody,
I have to convert the genebank file of my virus genome to a gtf file for the aligment of my reads with STAR. I tried different tools, which I found by google search, but I doesn't work unfortunately.
I tried to do it with a perl script I found, called genbank2gtf_mRNA.pl from genebank2gtf and I tried a python script, called gb2gtf.py from lpryszcz, but when using the perl script with a custom chromosome file, I get an empty gtf file and when I tried to use the python script, I get the error: "command didn't found".
Can someone maybe recommend a converter tool or platform which I could use?
And; the genebank file contains splice site annotation from genious, does the conversion translate these coordinates?
Any help is greatfully appreciated :)
Edited: data exemple and tools, I struggled with
My gb data looks something like that:
> FEATURES Location/Qualifiers
> repeat_region 552..634
> /vntifkey="34"
> /label=U5
> CDS 5830..8414
> /vntifkey="4"
> /label=TAT
> /note="HIV-1 tat protein"
> splicing_signal 5934..5935
> /vntifkey="38"
> /label=SA4c
> splicing_signal 5952..5953
> /vntifkey="38"
> /label=SA4a
> splicing_signal 5958..5959
> /vntifkey="38"
> /label=SA4b
> CDS 8787..9407
> /vntifkey="4"
> /label=NEF
> /note="HIV-1 nef protein"
> splicing_signal 5775..5776
> /vntifkey="38"
> /label=SA3
> splicing_signal 6045..6046
> /vntifkey="38"
> /label=SD4
> CDS 5969..8643
> /vntifkey="4"
> /label=REV
> /note="HIV-1 rev protein"
> splicing_signal 5974..5975
> /vntifkey="38"
> /label=SA5
> CDS 2085..5096
> /vntifkey="4"
> /label=POL
> /note="HIV-1 pol polyprotein; (NH2-terminus uncertain)"
> CDS 5041..5619
> /vntifkey="4"
> /label=VIF
> /note="HIV-1 vif protein"
> CDS 5559..5849
> /vntifkey="4"
> /label=VPR
> /note="HIV-1 vpr protein"
> repeat_region 9529..9626
> /vntifkey="34"
> /label=R
> /note="HIV-1 R repeat 3' copy"
> CDS 6061..6306
> /vntifkey="4"
> /label=VPU
> /note="HIV-1 vpu protein"
> CDS 6221..8785
> /vntifkey="4"
> /label=ENV
> /note="HIV-1 envelope polyprotein"
> splicing_signal 6602..6603
> /vntifkey="38"
> /label=(SA6)
> splicing_signal 6720..6721
> /vntifkey="38"
> /label=(SD5)
> /note="Mutation von GT in anderen Isolaten zu AT"
> splicing_signal 8367..8368
> /vntifkey="38"
> /label=SA7
> LTR 1..634
> /vntifkey="19"
> /label=5'_LTR
> /note="HIV-1 5' LTR"
> repeat_region 454..551
> /vntifkey="34"
> /label=R
> /note="HIV-1 R repeat 5' copy"
> intron 744..5776
> /vntifkey="15"
> /label=TAT/REV/NEF_I
> /note="HIV-1 tat, rev, nef mRNA intron 1"
> misc_feature 5743..5744
> /vntifkey="21"
> /label=JNCTN_NY5/LAV
> /note="HIV-1 isolate NY5 DNA end/HIV-1 isolate LAV DNA start"
> intron 6045..8368
> /vntifkey="15"
> /label=TAT_II
> /note="HIV-1 tat cds intron 2"
> CDS 790..2292
> /vntifkey="4"
> /label=GAG
> /note="HIV-1 gag polyprotein"
> insertion_seq 1186..1186
> /vntifkey="14"
> /label=p17/p24
> splicing_signal 4963..4964
> /vntifkey="38"
> /label=SD2
> splicing_signal 744..745
> /vntifkey="38"
> /label=SD1
> splicing_signal 4911..4912
> /vntifkey="38"
> /label=SA1
> splicing_signal 5464..5465
> /vntifkey="38"
> /label=SD3
> polyA_signal 9602..9607
> /vntifkey="25"
> /label=POLY_A
> /note="HIV-1 mRNA polyadenlyation signal"
> splicing_signal 5388..5389
> /vntifkey="38"
> /label=SA2
> intron 6045..8368
> /vntifkey="15"
> /label=REV_II
> /note="HIV-1 rev cds intron 2"
> intron 6045..8368
> /vntifkey="15"
> /label=TAT/REV/NEF_II
> /note="HIV-1 tat, rev, nef mRNA intron 2"
> LTR 9076..9709
> /vntifkey="19"
> /label=3'_LTR
> /note="HIV-1 3' LTR"
> misc_feature 5059..5060
> /vntifkey="21"
> /label=SD2b
> splicing_signal 8335..8336
> /vntifkey="38"
> /label=SA7a
> splicing_signal 8440..8441
> /vntifkey="38"
> /label=SA7b
> splicing_signal 4541..4542
> /vntifkey="38"
> /label=SA1a
> splicing_signal 4722..4723
> /vntifkey="38"
> /label=SD1a BASE COUNT 3423 a 1756 c 2364 g 2166 t ORIGIN
> 1 tggaa....
Hello,
it would be useful if you tell us which tools you've already tested and what was the problem with them. Otherwise you may get the same recommendations here.
fin swimmer
Oh yes, thank you, that makes sense, I will add it in the question.
Did google search and found genbank2gtf. Never tried. you can explore though
Thank you, I forgot to mention in the question, what I failed to use already. I tried using that tool, but I somehow can not execute it.
It is possible to download viral genomes in the GFF format from the GenBank, have you tried that?
PS: an example of the genome of interest would be useful.
Thank you! I know that, however, I need the file to be in GTF format for the mapping. I could download it as GFF3, however, I also struggeld using availible GFF3 to GTF converter..