Why Gencode GTF and UCSC table browser.bed for all exons have different exon annotations?
2
0
Entering edit mode
8.2 years ago
tiago211287 ★ 1.5k

I downloaded from table browser all exons in bedformat using hg19 reference genome and notted when comparing it with the GTF from genecode that it show different results.

gencode has some stuff that the bed exons dont and vice versa.

Look for example:

enter image description here

genome ucsc bed gtf exon • 2.7k views
ADD COMMENT
1
Entering edit mode
8.2 years ago
Sinji ★ 3.2k

They are two different annotations, curated differently so there are bound to be differences. I would wait for someone to give you a real answer since i've never been completely sure what the differences are. I've always just understood "yea we did the annotation differently, sometimes they are vastly different but .. hey that's okay".

ADD COMMENT
0
Entering edit mode
8.2 years ago
Denise CS ★ 5.2k

Yes, @Sinji is right: the mode of annotation across different groups is different, hence possible differences when comparing the two (or more) get sets. The differences tend to be in the number of alternatively spliced transcripts that are annotated by group A versus group B. One group may annotate more transcripts by showing different exons or splice junctions. The length of the UTRs and the accuracy of the annotation of pseudogene loci can also vary. Looking at the RefSeq Gene track, it does seem RefSeq is reasonably matched to the gencode.v19.annotation.bed one. I mean, the exons are sort of the same. On the other hand, track target.exons.bed seems to be the outlier in the image.

I just wonder why are you attaching the BED file into IGV if you have already this comparison in both Ensembl and UCSC browers? In the latter you can confirm a pretty long exon around 1:14387-16825, which has not been seen in other annotations. See the zoomed in view of this long exon on the UCSC browser.

Please note that back them (with GRCh37, i.e. hg19), the UCSC used to show a track called UCSC genes. Now with GRCh38, the track it's shown is 'All GENCODE V24' and the long exon is no longer there. I think the long exon was annotated based on mRNAs AX747611, AK092583 and AX747611, which have not been used to annotate GRCh38. GENCODE = gene set in Ensembl, which in turn contains the Ensembl annnotation + HAVANA manual annotation by Sanger Institute.

ADD COMMENT

Login before adding your answer.

Traffic: 2380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6