I have an incomplete GTF file with lines such as:
chr1 hg38_ct_UserTrack_3545 exon 94353 94355 2109 + . gene_id "R2_66"; transcript_id "R2_66_1";
This describes an exon. All the lines in my incomplete GTF file describe an exon or CDS.
Question:
I want to fix my GTF file. For instance, I need to do something like:
chrI hg38_ct_UserTrack_3545 gene 6790136 6808198 . + . gene_id "R1_102";
and
chrI hg38_ct_UserTrack_3545 transcript 6790136 6808198 . + . transcript_id "R1_102";
I would like to add annotation at the transcript and gene level. What's the best way?
I think you can get gene region on genome by your gtf file. Ignore UTR region. You can try to get the start and end position of a gene. The start and end position should in the first and last exon's boundary, these exons should belong to the same gene.