Hi! I have downloaded viral genome (HAZARA virus) in NCBI virus. I opened the folder but no gtf file content, only gbff file. So, I converted the gbff file to gff file, finally from gff file to gtf file. It seems that there are no exon lines in the 3rd column of the gtf file, only CDS and transcripts.
Therefore, when I performed indexing and maping using STAR, I used --sjdbGTFfeatureExon CDS \ in my scripts. In indexing, I generated results with no error, other files have file sizes, except for TAB files (particularly sjdbList.out.tab and sjdbList.fromGTF.out.tab) with 0 file size. So when I peformed mapping after indexing, it seemed that genome index is incompatible since the tab files have 0 file size.
Could you help me how to run STAR indexing and mapping successfully? Is the viral genome info lack exons?
If no proper annotations are present in terms of GTF, is there a transciptome fasta available? Then you could quantify with something like salmon, maybe even using the genome as a decoy. See salmon docs on details. At minimum it needs a transcriptome fasta. It can use a genome as decoy, meaning it will check whether for any potential transcript alignment there is a better match in the genome to ensure DNA contaminations are not incorrectly quantified.
No transcriptome available. I'm particularly using a virus genome. I need to perform indexing and mapping using STAR. Is there a way to know how to obtain a gtf file with exon information? so that STAR can read it when doing mapping.
You can find the GTF file here --> https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/831/085/GCF_002831085.1_ASM283108v1/
Be sure to get the corresponding genome sequence file as well (so everything matches).
Thank you. But I think the one you provided is from Refseq (as evident from the file name “GCF…”, mine I used the genome from GenBank with the names “GCA…”