UCSC genes annotation of long non-coding RNAs in human
1
1
Entering edit mode
10.5 years ago
sasa_k ▴ 10

Dear everyone!

I desperately try to find a way of getting a list of all the lncRNAs annotated in UCSC (UCSC genes).

Each lncRNA gene in UCSC databases is marked as a lncRNA, but to my knowledge there is no separate table/file available for download.

For sure plenty of people working in the lncRNA field faced this and should have a simple answer or maybe could even share the annotation file??

Thank you very very much!!

alexandra

UCSC lncRNAs annotation RNA-Seq • 8.4k views
ADD COMMENT
0
Entering edit mode

Why not just download the GTF file and then just use awk (awk '{if($2=="lincRNA") print $0}' original.gtf > filtered.gtf, works for the Ensembl annotation, but you can always use grep if UCSC doesn't use the same format) to filter by entries annotated to be lncRNAs?

ADD REPLY
0
Entering edit mode

Dear Devon Ryan,

Thanks a lot for your reply!

The problem is that the GTF file I get from UCSC doesn't have much information inside. (You are right that Ensembl gives a nice annotation and I used grep for GENCODE annotation files to subdivide them into snoRNAs etc).

However, when I download from the UCSC Table Browser - group: Genes and Gene Prediction Tracks - track: UCSC Genes - table: knownGene, it looks like that

chr1    hg19_knownGene  exon    11874   12227   0.000000        +       .       gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1    hg19_knownGene  exon    12613   12721   0.000000        +       .       gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1    hg19_knownGene  exon    13221   14409   0.000000        +       .       gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";

Do you maybe know if I should download a different table from the UCSC Browser?

Thanks!
Alexandra

ADD REPLY
0
Entering edit mode

I don't know if UCSC has one available for the current human annotation (hg38). You can download the track for hg19, if that's what you're using (it's the "lincRNA Transcripts" track).

ADD REPLY
0
Entering edit mode

Yes, I use hg19, but the "lincRNA Transcripts" is the track made from Cabili et al, 2011 lncRNAs.

It is not "tidy" and differs from lncRNAs that are shown by UCSC Genes track.

And I actually thought that UCSC does some validation prior to including transcripts into UCSC genes.

Sorry for so much details. :)

Alexandra

ADD REPLY
1
Entering edit mode

Yet another reason to use the Ensembl annotation :)

ADD REPLY
0
Entering edit mode

Are you specifically interested in UCSC annotation?

ADD REPLY
0
Entering edit mode

Yes, I would like to check my list of lncRNAs against all public annotations. And I saw examples where lncRNAs differ in exon models in RefSeq, UCSC and GENCODE annotations, or are missing from one and present in other.

That is why I would like to get the UCSC lncRNA annotation. But I almost gave up, and am thinking about just using the Cabili list instead, although it has quite some artifacts.

ADD REPLY
1
Entering edit mode
8.0 years ago
Shicheng Guo ★ 9.5k

Go to https://www.gencodegenes.org/releases/19.html

wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz
tar xzvf gencode.v19.long_noncoding_RNAs.gtf.gz
awk 'NR>5 {print $1,$4,$5,$10}' gencode.v19.long_noncoding_RNAs.gtf > lncRNA.hg19.bed
perl -p -i -e "s/[\";]//g" lncRNA.hg19.bed
ADD COMMENT
0
Entering edit mode

I hope OP either found the data or stopped searching by now ;-)

ADD REPLY
0
Entering edit mode

;-), 2.5 years ago. I really hope he found it. or else, what a awful day!

ADD REPLY

Login before adding your answer.

Traffic: 1989 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6