Hello
Can anyone tell me where I can find a gtf/gff file for all the long non-coding RNAs in human preferably ENSEMBL annotations?
Thanks
Hello
Can anyone tell me where I can find a gtf/gff file for all the long non-coding RNAs in human preferably ENSEMBL annotations?
Thanks
You could try this one: http://www.lncipedia.org/download (haven't used myself but will in a few weeks).
Some options:
GENCODE - http://www.gencodegenes.org/
MiTranscriptome - http://mitranscriptome.org/
The good thing about going to link to GENCODE provided by @igor is that you can get a separate GTF containing long non coding RNA only. This GTF is a subset of the file in the link provided by @Emily_Ensembl. If you go with the latter, you need to retrieve the long non coding RNA from the rest (whether protein coding or not). If that's your choice, focus on the biotype 'lincRNA'. Check Annotation of ncRNAs for more details on those biotypes. You can check the number of lincRNAs found in the current human assembly and gene annotation versions from Ensembl.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If you are looking fro Ensembl annotations, why not get them from Ensembl?
I didn't find one just with long non-coding RNA annotation there.
Download the full GTF and then filter by "gene_biotype". Each line contains it, so you can do it with a simple grep command.
Available gene biotypes: http://www.gencodegenes.org/gencode_biotypes.html
Non-coding info: http://useast.ensembl.org/info/genome/genebuild/ncrna.html
Hi, Igor,
I followed your instructions and downloaded the hg19_gtf, but there is not "gene_biotype" column in the gtf (screen capture link: https://drive.google.com/file/d/1OkpcDF_u2-yzAKlg8s46vIVOi8pc4AZ1/view?usp=sharing)
The GTF you have is not from GENCODE or Ensembl. GTF files from other sources may not have a
gene_biotype
field.Would be valid to use the gene_biotype specific gtf file for quantification after reads have been aligned to the reference genome?
You probably want to use the full GTF. You should probably be using the full transcriptome for normalization anyway. I wouldn't throw away useful information unless you are removing just some problematic biotypes.