Question

GTF/GFF for non-coding RNA

2

Entering edit mode

8.5 years ago

pm2012 ▴ 140

Hello

Can anyone tell me where I can find a gtf/gff file for all the long non-coding RNAs in human preferably ENSEMBL annotations?

Thanks

GFF GTF Ensembl Annotation long non-coding • 11k views

ADD COMMENT • link updated 8.0 years ago by lironyoffe • 0 • written 8.5 years ago by pm2012 ▴ 140

0

Entering edit mode

If you are looking fro Ensembl annotations, why not get them from Ensembl?

ADD REPLY • link 8.5 years ago by igor 13k

0

Entering edit mode

I didn't find one just with long non-coding RNA annotation there.

ADD REPLY • link 8.5 years ago by pm2012 ▴ 140

3

Entering edit mode

Download the full GTF and then filter by "gene_biotype". Each line contains it, so you can do it with a simple grep command.

Available gene biotypes: http://www.gencodegenes.org/gencode_biotypes.html

Non-coding info: http://useast.ensembl.org/info/genome/genebuild/ncrna.html

ADD REPLY • link 8.5 years ago by igor 13k

0

Entering edit mode

Hi, Igor,

I followed your instructions and downloaded the hg19_gtf, but there is not "gene_biotype" column in the gtf (screen capture link: https://drive.google.com/file/d/1OkpcDF_u2-yzAKlg8s46vIVOi8pc4AZ1/view?usp=sharing)

ADD REPLY • link 4.6 years ago by xiaoleiusc ▴ 140

0

Entering edit mode

The GTF you have is not from GENCODE or Ensembl. GTF files from other sources may not have a gene_biotype field.

ADD REPLY • link 4.6 years ago by igor 13k

0

Entering edit mode

Would be valid to use the gene_biotype specific gtf file for quantification after reads have been aligned to the reference genome?

ADD REPLY • link 4.4 years ago by Arindam Ghosh ▴ 530

0

Entering edit mode

You probably want to use the full GTF. You should probably be using the full transcriptome for normalization anyway. I wouldn't throw away useful information unless you are removing just some problematic biotypes.

ADD REPLY • link 4.4 years ago by igor 13k

score 1 · Answer 1 · 2016-06-06

1

Entering edit mode

8.5 years ago

WouterDeCoster 47k

You could try this one: http://www.lncipedia.org/download (haven't used myself but will in a few weeks).

ADD COMMENT • link 8.5 years ago by WouterDeCoster 47k

score 1 · Answer 2 · 2016-06-06

1

Entering edit mode

8.5 years ago

igor 13k

Some options:

GENCODE - http://www.gencodegenes.org/
MiTranscriptome - http://mitranscriptome.org/

ADD COMMENT • link 8.5 years ago by igor 13k

score 1 · Answer 3 · 2016-06-07

1

Entering edit mode

8.5 years ago

Emily 24k

The GTF of all Ensembl genes, including all coding and non-coding biotypes, is on the FTP site.

ADD COMMENT • link 8.5 years ago by Emily 24k

score 1 · Answer 4 · 2016-06-07

The good thing about going to link to GENCODE provided by @igor is that you can get a separate GTF containing long non coding RNA only. This GTF is a subset of the file in the link provided by @Emily_Ensembl. If you go with the latter, you need to retrieve the long non coding RNA from the rest (whether protein coding or not). If that's your choice, focus on the biotype 'lincRNA'. Check Annotation of ncRNAs for more details on those biotypes. You can check the number of lincRNAs found in the current human assembly and gene annotation versions from Ensembl.