One isoform can be selected as "canonical", based on experimental evidence collected into a database called APPRIS, or based on length or other criteria.
When you look at GFF3 files on the Gencode site, for instance, you may see some entries tagged with appris_*
prefixes to denote the grade of evidence for "canonical-ity".
Other isoforms exist because genes can be transcribed in different ways. It can be useful to pick one isoform from all that are available for a gene, for the purposes of doing analyses.
In the UCSC browser, this isoform is perhaps experimentally-determined to be, say, expressed the most among all alternative transcripts, so it gets labeled with an inverted text label to give you a visual cue that this is canonical. The other labels are unadorned.
You might want to work with the canonical gene annotation, when doing your work. It can depend on your experiment.
Internally, UCSC keeps a table called knownCanonical
that is used to label such isoforms. This table is available for direct inspection via Goldenpath for various assemblies, e.g. for hg38.
In hg38, as an example, the XIST gene has an isoform called ENST00000429829.6 which is labeled as canonical, and sits at chrX:73820655-73852723 (zero-indexed, which will be adjusted to one-indexed in the UCSC browser view).
You can grab the knownCanonical
table and verify that this transcript is there:
% wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz" | gunzip -c | grep ENST00000429829.6
chrX 73820655 73852723 28961 ENST00000429829.6 ENSG00000229807.12
If you do the same for the canonical-labeled MYC for hg38 or other assembly, you should see a similar result:
% wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz" | gunzip -c | grep ENST00000621592.8
chr8 127736230 127742951 7390 ENST00000621592.8 ENSG00000136997.21
Hi,
The difference is related to the different transcripts of the same gene. if you hover over the exon, the name of the transcript to which it belongs will appears, probably starting with ENST.
LIFE