Where To Download Genome Annotation Including Exon, Intron, Utr, Intergenic Information
1
3
Entering edit mode
10.8 years ago
liran0921 ▴ 150

Hi Everyone,

This might be a simple question but has been bothering me. Actually I have some small RNA which have been mapped to genome. But I want to find out their location in the genome (exon, intron, UTR, intergenic). So I would like to use a genome annotation with these information to do that. I tried Ensemble and UCSC genome broswer, but failed to get what I want. Can anybody give me some instructions? Many thanks!

genome • 22k views
ADD COMMENT
0
Entering edit mode

What kind of small RNA you are talking about? UCSC and Ensembl have annotation information for lot of non-coding RNAs.

ADD REPLY
0
Entering edit mode

I got some novel miRNAs which don't have any annotation information. So I want to overlap them with the gene annotation in the genome.

ADD REPLY
0
Entering edit mode

1) You can try downloading the .gff3 file from miRBase (http://www.mirbase.org/ftp.shtml) for your specie of interest. It has coordinates for most of the known miRNAs.

2) You can download the gene annotation information using the UCSC table (http://genome.ucsc.edu/cgi-bin/hgTables?command=start)

Excuse me if this is not the answer to your question. I may have not fully understood what you want.

ADD REPLY
0
Entering edit mode

Thanks. I tried UCSC table, but there are only exon coordinate info in the output gtf file. So how to extract the info for intronic, UTR and intergenic region?

ADD REPLY
27
Entering edit mode
10.8 years ago
Mitch Bekritsky ★ 1.3k

If you want to get annotations for every exon/intron/UTR in a reference genome, you can use the UCSC Table Browser.

Here's how to get it done:

  1. Pick you reference genome under clade/genome/assembly
  2. Make sure the group is "Genes and Gene Predictions"
  3. Choose your preferred track (I like to rely on RefSeq and CCDS)
  4. Choose the table that gives gene information (e.g. for RefSeq, the table you want is refGene)
  5. Select your region or the entire genome to get coordinates for
  6. Select BED format as your output format
  7. Name your output file
  8. Click "get output"

On the next page, you will get the option to get coordinates only for all exons, coding exons, introns, 5' UTRs, or 3' UTRs (plus flanking sequence if you want). You can download these coordinates however you'd like (I prefer having one file for each genomic feature type), then overlap your mapped sequences to the genomic features using bedtools' intersect.

To find intergenic regions, you can create a merged BED file of all exons, introns and UTR sequences and look for mapped sequences that overlap NONE of those features using bedtools intersect with the -v option.

If your curious about other ways to use bedtools to analyze your mapped sequences, I've found this site to have the best documentation.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. It's very helpful!

ADD REPLY
0
Entering edit mode

My pleasure! I'm glad I was able to help

ADD REPLY
0
Entering edit mode

Thanks for the answer, but after "get output" I can not see the option for getting intronic and/or UTR coordinates, only for exons. Can you share a screenshot please?

ADD REPLY
0
Entering edit mode

Hi Richard,

Sorry for the late reply -- holidays and all. Here is a link to the screenshot of the page I get after I follow steps 1-8 above (sorry it's not embedded here...I've tried embedding images hosted on Dropbox and Google Drive in tiff, jpeg, and png formats, to no avail...). As you can see, the intron option is right there. My guess is you may not have selected the correct output format on the table browser page. Can you screenshot it? Or link to a screen shot?

ADD REPLY
0
Entering edit mode

Can I do this with Ensembl? I used the Ensembl GRCm38 to align my RNA-seq data, and now I want to summarize (htseq) using introns and exons both. The ensembl GTF that I got seems to have transcript and gene. Does gene = coordinates of exons + introns? Or is it just exons?

ADD REPLY

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6