Hi!
I am doing exactly the same thing, I tried different approaches but the easiest and most effective, in my opinion, is the query with API ensembl. If you didn't do it yet, try to have a look at it (http://www.ensembl.org/info/docs/index.html) and related tutorials (http://www.ebi.ac.uk/training/online/course/ensembl-filmed-api-workshop).
Here there is the script that Magali from EnsEMBL suggested me (I am working on API ensembl 67 and I'm using Perl 5.18.2 on a Mac).
my $registry = Bio::EnsEMBL::Registry->load_
registry_from_db(
-host => '[ensembldb.ensembl.org](http://ensembldb.ensembl.org)',
-user => 'anonymous',
-port => '3306'
);
my $transcript_adaptor = $registry->get_adaptor('human', 'core', 'Transcript');
my $stable_id = 'ENST00000380152';
my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);
# Only get exons within the coding region
my $exons = $transcript->get_all_translateable_Exons();
foreach my $exon (@$exons) {
# Print the genomic coordinates for each exon
print "Exon " . $exon->stable_id . ":" . $exon->start . "-" . $exon->end. "\t";
my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end, $exon->strand);
foreach my $pep (@pep_coords) {
# Print the protein coordinates for each exon
print $pep->start() . "-" . $pep->end() . "\n";
}
}
my $translation = $transcript->translation;
# Check if there is a translation
if ($translation) {
my $pfs = $translation->get_all_ProteinFeatures();
# Display all protein features
foreach my $pf (@$pfs) {
print $pf->hseqname . ":" . $pf->start . "-" . $pf->end . "\n";
}
}
If you only have exon coordinates to start with, you will need to create a slice for each set of coordinates, then retrieve transcripts overlapping that slice and use the process described above.
my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome, $exon_start, $exon_end);
my $transcripts = $slice->get_all_Transcripts();
(of course write everything straight, in a textfile). You can easily modify it to read from a textfile containing your transcript IDs (and I guess also for the exon coordinates, but I haven't try yet) and to obtain output file instead of printing on your console, and if you need the chromosome info also in the exon output have a look at the example script in the lecture "Genes, Transcripts, Exons and translations" (http://www.ebi.ac.uk/training/online/sites/ebi.ac.uk.training.online/files/user/1317/documents/core5.pdf).
Hope that it might be helpful. Let me know if you're having trouble in modifying the script to read from files and to write new files, I might share my trial scripts (but I have to warn you I started using Perl just this Monday)
Good luck!