I was wondering if anyone could give me advice on acquiring all introns from human Grch37 from Ensembl? I can get the introns from human Grch38 through Perl API using the following code:
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Utils::SeqDumper;
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(-host => 'ensembldb.ensembl.org',
-port => 5306,
-user => 'anonymous',
-passwd => undef,
-db_version => '84');
my $gene_adapter = $registry->get_adaptor('human', 'Core', 'Gene');
my $dumper = Bio::EnsEMBL::Utils::SeqDumper->new();
while(my $gene_id = shift(@{$gene_adapter->list_stable_ids()})) {
my $gene = $gene_adapter->fetch_by_stable_id($gene_id);
my $canonical_transcript = $gene->canonical_transcript();
while(my $intron = shift(@{$canonical_transcript->get_all_Introns()})) {
$dumper->dump($intron->feature_Slice(), 'FASTA', 'introns.fasta');
}
}
However, when I try changing the -db_version to a db version that uses Ghrc37, such as release 64, I get the following error:
For homo_sapiens_core_64_37 there is a difference in the software release (84) and the database release (64). You should update one of these to ensure that your script does not crash. DBD::mysql::st execute failed: Unknown column 'stable_id' in 'field list' at /home/.../src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm line 312.
-------------------- EXCEPTION --------------------
MSG: Detected an error whilst executing SQL 'SELECT `stable_id` FROM `gene`': DBD::mysql::st execute failed: Unknown column 'stable_id' in 'field list' at /home/.../src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm line 312.
STACK Bio::EnsEMBL::DBSQL::BaseAdaptor::_list_dbIDs /home/.../src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm:313 STACK Bio::EnsEMBL::DBSQL::GeneAdaptor::list_stable_ids /home/.../src/ensembl/modules/Bio/EnsEMBL/DBSQL/GeneAdaptor.pm:152 STACK toplevel get_introns.pl:19 Date (localtime) = Thu Jul 14 16:04:48 2016 Ensembl API version = 84
---------------------------------------------------
Anyone have advice on this or can give an easier way to get the introns? Thank you!
Denise is correct about using port 3337, however she is wrong about the API itself. You can use the same API to access GRCh37 as GRCh38.
So I am using the same code as I posted above but with port 3337 and homo_sapiens_core_84_37 as the db. I am using the newest API. However, now im getting the following error:
I also tried replacing human with homo sapiens with no luck. Any thoughts?
Oh ignore the post before. If I use port 3337 with no db version specified, I get introns from GRCh37. Thank you both so much for your help!
I am having an additional issue now. My code seems to hang on the first canonical transcript's intron. It prints it multiple times and then hangs.
Here the updated code:
Any help is appreciated.
Changing the while loops to foreach seemed to do the trick.