Hi Guys,
I am trying to find more info about the features in the ENSEMBL GTF file, but don't know where to find it. I am using the hg38 GTF file from ENSEMBL, and I am interested in column 3 (feature). More specific I would like to know the exact definition of transcript and gene. Is the transcript including UTR? Is the gene including UTR? Introns?
It seems that there are 9 possible features possible:
awk '{print $3}' Homo_sapiens.GRCh38.87.gtf | sort | uniq
CDS
exon
five_prime_utr
gene
Selenocysteine
start_codon
stop_codon
three_prime_utr
transcript
Like said I am especially interested in the difference between gene and transcript. If someone could give me the definition or direct me to where it is documented, I would really appreciate it. Thanks.
Thanks guys for your input. maybe I should have said that I have a PhD in genetics/genomics and work as a bioinformatician for years already. Seen by the answers, that was probably not clear.
This wasn't a newbie question about how simple genetics works, but I couldn't find the definition/criteria used for making the GTF annotation file. Maybe because some of the genes are annotated manually, I don't know.
Thanks anyway!
Hi b.nota,
This is an old thread but wondering if you were able to have a good reference that answers your question. I'm also not a total newbie but could just use some assistance in the annotation.
Thank you.
Nothing more than the answers herein, I'm afraid.