Entering edit mode
12.6 years ago
Ttnguyen
▴
70
Any ideas why recent releases of Ensembl human genes (e.g. 66 & 67) do not provide canonical transcripts?
Any ideas why recent releases of Ensembl human genes (e.g. 66 & 67) do not provide canonical transcripts?
I have run the following code using the release 67 Perl API (http://www.ensembl.org/info/docs/api/index.html):
#!/usr/bin/env perl
# Check EnsEMBL Transcripts
# Coded by Steve Moss (gawbul [at] gmail [dot] com)
# http://about.me/gawbul
# make things easier
use strict;
use warnings;
# import modules
use Bio::EnsEMBL::Registry;
use Data::Dumper;
# setup registry
my $registry = 'Bio::EnsEMBL::Registry';
# connect to EnsEMBL
$registry->load_registry_from_db(-host => "ensembldb.ensembl.org",
-user => "anonymous");
# get gene adaptor object from registry for human core
my $gene_adaptor = $registry->get_adaptor("Human", "Core", "Gene");
# get list of gene stable IDs
my $gene_ids = $gene_adaptor->list_stable_ids();
# traverse gene IDs
my $count = 0;
my $defined_count = 0;
my $undefined_count = 0;
print "Processing " . scalar(@{$gene_ids}) . " gene IDs...\n";
while (my $gene_id = shift(@{$gene_ids})) {
# let user know count
local $| = 1;
print "[$count/" . scalar(@{$gene_ids}) . "]\r";
# get gene object
my $gene = $gene_adaptor->fetch_by_stable_id($gene_id);
# get canonical transcript
my $canonical_transcript = $gene->canonical_transcript();
# check defined
if (defined $canonical_transcript) {
$defined_count++;
}
else {
$undefined_count++;
}
$count++;
# undef the transcript
$canonical_transcript = undef;
}
# let the user know
print "$defined_count defined \& $undefined_count undefined in $count.\n";
print "...done!\n";
and I get the following output:
w232-244:Code stevemoss$ perl check_canonical_transcripts.pl
Processing 56478 gene IDs...
[56478/0]
56478 defined & 0 undefined in 56478
...done!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Not come across this myself! Not wanting to sound doubting, but, what makes you think that? Have you run some code that has returned undef for the canonical transcripts?
I am just guessing the concept of canonical transcript is no longer helpful.
Canonical transcripts are just the name we give to the transcripts used to build the gene trees. As such, they still exist in Ensembl. Maybe it would be helpful if you could describe where you use to find them. Are you talking about the Perl API, the FTP files, the web or biomart?
I were using BioMart to find them. But from Steve's reply below, I should get the canonical transcripts by API.