How to separate protein-coding and non-coding in gtf file
1
0
Entering edit mode
6.5 years ago
Vasu ▴ 790

Hi,

In a gtf file I see "gene_type" column with different names like below. Among those names what all come under non-coding, protein_coding, lncRNA?

3prime_overlapping_ncRNA
IG_C_gene
IG_C_pseudogene
IG_D_gene
IG_J_gene
IG_J_pseudogene
IG_V_gene
IG_V_pseudogene
IG_pseudogene
Mt_rRNA
Mt_tRNA
TEC
TR_C_gene
TR_D_gene
TR_J_gene
TR_J_pseudogene
TR_V_gene
TR_V_pseudogene
antisense_RNA
bidirectional_promoter_lncRNA
lincRNA
macro_lncRNA
miRNA
misc_RNA
non_coding
polymorphic_pseudogene
processed_pseudogene
processed_transcript
protein_coding
pseudogene
rRNA
ribozyme
sRNA
scRNA
scaRNA
sense_intronic
sense_overlapping
snRNA
snoRNA
transcribed_processed_pseudogene
transcribed_unitary_pseudogene
transcribed_unprocessed_pseudogene
translated_processed_pseudogene
unitary_pseudogene
unprocessed_pseudogene
vaultRNA

I see the gene_type protein_coding. Are those only the protein_coding or should I also consider any other gene_type? What all come under non-coding? And lncRNA?

RNA-Seq gtf proteincoding noncoding lncrna • 3.1k views
ADD COMMENT
3
Entering edit mode
6.5 years ago

If you look at protein coding genes then yes you can only filter in the protein_coding type ones.

Here's an explanation for the different types found in ENSEMBL as suggested by i.sudbery : https://www.gencodegenes.org/gencode_biotypes.html

ADD COMMENT
0
Entering edit mode

I might be wrong, but I think VEGA is now retired and the up to date reference for biotypes found in the recent (gencode based) ensembl builds is https://www.gencodegenes.org/gencode_biotypes.html

ADD REPLY
0
Entering edit mode

After looking on ENSEMBL website, you are right. I edit my answer.

ADD REPLY

Login before adding your answer.

Traffic: 1524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6