Hello everyone, I am new to perl API for EnsEMBL. I am even new to perl but doing some readings on both of these, I could write a small code to extract the protein and cDNA sequences of canonical transcript for given Ensembl gene ids. I have pasted my code below:
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
use Data::Dumper;
my $registry = "Bio::EnsEMBL::Registry";
$registry->load_registry_from_db( -host =>'ensembldb.ensembl.org', -user => 'anonymous' );
my $gene_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Gene' );
my @gene_ids=("ENSG00000001629","ENSG00000001630","ENSG00000001631","ENSG00000002016");
foreach my $gene_id (@gene_ids) {
my $gene = $gene_adaptor->fetch_by_stable_id($gene_id);
my $cdsseq = $gene->canonical_transcript()->translateable_seq();
my $protseq = $gene->canonical_transcript()->translate()->seq();
print $gene_id,"_cDNA\n",$cdsseq,"\n",$gene_id,"_protein\n",$protseq,"\n";
}
But, what I really want to do is to get all the translatable transcripts (having protein sequence) for a gene including the ensembl protein id and transcript id. (ENSG..............|ENSP...................|ENST...............\n peptide sequence) in a fasta file. And similary, cDNA sequences in another fasta file. Also, my script breaks when there is no translatable sequence for a gene, how do i set an if clause for that? Thank you for your help!!
Abhishek
Thank you Nolwenn Lavielle!! It looks simple, I didnt know what condition to use. I am still waiting if somebody shows me way how to get all the translatable transcripts for each gene.
Small fix: you need to set $gene first, since it is used to get the value for $cdsseq:
I didn't notice this detail. Thanks for your fix!
thank you for the correction!
One more question, is it possible to extract the start and end positions of translatable cdna sequence? I tried to extract the positions but I could only extract start and end positions of transcript which might include non-translatable part as well.