How To Generate A Complete Human Chromosome From Ensembl With The Bioperl Ensembl Api
4
1
Entering edit mode
13.4 years ago
Simon ▴ 20

Hi, I would like to build an EMBL formated data for each human chromosome from Ensembl, using the associated perl API. Is this possible? I was able to retrieve sequence data and positions information on gene, tarnscripts, but I did not find any way to generate an entry in EMBL format for a whole chromosome...

Has anyone already used/make a script to do this?

Thanks Simon

ensembl bioperl • 3.5k views
ADD COMMENT
4
Entering edit mode
13.4 years ago
Akk ▴ 210

Here's an example of dumping the slice overlapping the human BRCA2 gene in EMBL format:

use strict;
use warnings;

use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Utils::SeqDumper;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db( '-host' => 'ensembldb.ensembl.org',
                                  '-port' => '5306',
                                  '-user' => 'anonymous',
                                  '-db_version' => 63 );

my $sa = $registry->get_adaptor( 'Human', 'Core', 'Slice' );

# Get slice overlapping BRCA2 gene:
my $slice =
  $sa->fetch_by_region( 'Chromosome', '13', 32889611, 32973805 );

my $seqdumper = Bio::EnsEMBL::Utils::SeqDumper->new();

# Disable uninteresting feature types:
$seqdumper->disable_feature_type('estgene');
$seqdumper->disable_feature_type('genscan');
$seqdumper->disable_feature_type('repeat');
$seqdumper->disable_feature_type('similarity');
$seqdumper->disable_feature_type('variation');
$seqdumper->disable_feature_type('vegagene');

# Dump to stdout:
$seqdumper->dump( $slice, 'EMBL' );

Cheers, Andreas (Ensembl/Core)

ADD COMMENT
1
Entering edit mode

BTW, while putting this together I noticed a small (tiny, but fatal) error in the documentation of the SeqDumper module in our API and fixed it (will be in release 64). Thanks for turning my eyes in its direction!

ADD REPLY
3
Entering edit mode
13.4 years ago

The EMBL format dumps for Ensembl are here: ftp://ftp.ensembl.org/pub/current_embl/homo_sapiens/README

ADD COMMENT
1
Entering edit mode

In fact I was using these data precedently, but I had to write a bunch of scripts to reconstruct whole chromosome sequences from these data, which was not the most efficient way to work I suppose. This is why was looking for other possibilities to get a whole sequence. Thanks for you help anyway!

ADD REPLY
0
Entering edit mode

the whole chromosomes are available in the current_fasta directory, but I think what you are looking for is something in between whole-chromosome fasta and EMBL dumps, right?

ADD REPLY
1
Entering edit mode
13.4 years ago

Why do you need to do this via the Ensembl API? Files like these are available for downloaded from EMBL-bank directly at: http://www.ebi.ac.uk/genomes/eukaryota.html

e.g. chromosome 22 in version GRCh37 can be downloaded in EMBL format here (n.b. this is a 10 Mb file)

ADD COMMENT
0
Entering edit mode

I will do that.

Initially I was thinking of using Ensembl data (with the Ensembl gene info, etc) , but these data are fine as well.

Thanks for your help!

Simon

ADD REPLY
1
Entering edit mode
13.4 years ago

Using the Ensembl API, there is a module called Bio::EnsEMBL::Utils::SeqDumper which, if given a whole chromosome Slice object, can write this data out.

Do you need more explanations/more complete script- just ask.

ADD COMMENT

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6