Dear everyone!
I desperately try to find a way of getting a list of all the lncRNAs annotated in UCSC (UCSC genes).
Each lncRNA gene in UCSC databases is marked as a lncRNA, but to my knowledge there is no separate table/file available for download.
For sure plenty of people working in the lncRNA field faced this and should have a simple answer or maybe could even share the annotation file??
Thank you very very much!!
alexandra
Why not just download the GTF file and then just use awk (
awk '{if($2=="lincRNA") print $0}' original.gtf > filtered.gtf
, works for the Ensembl annotation, but you can always use grep if UCSC doesn't use the same format) to filter by entries annotated to be lncRNAs?Dear Devon Ryan,
Thanks a lot for your reply!
The problem is that the GTF file I get from UCSC doesn't have much information inside. (You are right that Ensembl gives a nice annotation and I used grep for GENCODE annotation files to subdivide them into snoRNAs etc).
However, when I download from the UCSC Table Browser - group: Genes and Gene Prediction Tracks - track: UCSC Genes - table: knownGene, it looks like that
Do you maybe know if I should download a different table from the UCSC Browser?
Thanks!
Alexandra
I don't know if UCSC has one available for the current human annotation (hg38). You can download the track for hg19, if that's what you're using (it's the "lincRNA Transcripts" track).
Yes, I use hg19, but the "lincRNA Transcripts" is the track made from Cabili et al, 2011 lncRNAs.
It is not "tidy" and differs from lncRNAs that are shown by UCSC Genes track.
And I actually thought that UCSC does some validation prior to including transcripts into UCSC genes.
Sorry for so much details. :)
Alexandra
Yet another reason to use the Ensembl annotation :)
Are you specifically interested in UCSC annotation?
Yes, I would like to check my list of lncRNAs against all public annotations. And I saw examples where lncRNAs differ in exon models in RefSeq, UCSC and GENCODE annotations, or are missing from one and present in other.
That is why I would like to get the UCSC lncRNA annotation. But I almost gave up, and am thinking about just using the Cabili list instead, although it has quite some artifacts.