Question

exon pfam domain overlap

3

Entering edit mode

10.2 years ago

Duff ▴ 670

Hi Biostars

I have a number of exon from various genes that are high confidence for differential splicing. I have used biomaRt (Ensembl) to capture positional information about these exons in their parent genes and transcripts. I have also used the refLocsToLocalLocs function from the bioconductor package VariantAnnotation to capture the ranges these exon span in the protein (see Hg19 Coordinate To Cds Position). Incidentally if anyone knows how to do this with the mapCoords function instead I'd be delighted to know; refLocsToLocalLocs is deprecated apparently.

Now I would like to examine (broadly) the functional consequences of the alternate splicing on the proteins. My first thought was to use pfam to capture positional information on domains in each protein, convert this to a GRanges object and usefindOverlaps to examine which exons overlapped domains.

So, does anyone know of a good way to capture this information from pfam (or other protein databases)?

Or does anyone have a suggestion for an alternate strategy? My knowledge of protein databases and motif searching is not so great.

Thanks,
duff

protein R exon • 3.0k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by Duff ▴ 670

Ram · Answer 1 · 2015-05-21

Hi!

I am doing exactly the same thing, I tried different approaches but the easiest and most effective, in my opinion, is the query with API ensembl. If you didn't do it yet, try to have a look at it (http://www.ensembl.org/info/docs/index.html) and related tutorials (http://www.ebi.ac.uk/training/online/course/ensembl-filmed-api-workshop).

Here there is the script that Magali from EnsEMBL suggested me (I am working on API ensembl 67 and I'm using Perl 5.18.2 on a Mac).

my $registry = Bio::EnsEMBL::Registry->load_

registry_from_db(
-host => '[ensembldb.ensembl.org](http://ensembldb.ensembl.org)',
-user => 'anonymous',
-port => '3306'
);

my $transcript_adaptor = $registry->get_adaptor('human', 'core', 'Transcript');
my $stable_id = 'ENST00000380152';
my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);

# Only get exons within the coding region
my $exons = $transcript->get_all_translateable_Exons();
foreach my $exon (@$exons) {
  # Print the genomic coordinates for each exon
  print "Exon " . $exon->stable_id . ":" . $exon->start . "-" . $exon->end. "\t";
  my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end, $exon->strand);
  foreach my $pep (@pep_coords) {
    # Print the protein coordinates for each exon
    print $pep->start() . "-" . $pep->end() . "\n";
  }
}

my $translation = $transcript->translation;
# Check if there is a translation
if ($translation) {
  my $pfs = $translation->get_all_ProteinFeatures();
  # Display all protein features
  foreach my $pf (@$pfs) {
    print $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
  }
}

If you only have exon coordinates to start with, you will need to create a slice for each set of coordinates, then retrieve transcripts overlapping that slice and use the process described above.

my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome, $exon_start, $exon_end);
my $transcripts = $slice->get_all_Transcripts();

(of course write everything straight, in a textfile). You can easily modify it to read from a textfile containing your transcript IDs (and I guess also for the exon coordinates, but I haven't try yet) and to obtain output file instead of printing on your console, and if you need the chromosome info also in the exon output have a look at the example script in the lecture "Genes, Transcripts, Exons and translations" (http://www.ebi.ac.uk/training/online/sites/ebi.ac.uk.training.online/files/user/1317/documents/core5.pdf).

Hope that it might be helpful. Let me know if you're having trouble in modifying the script to read from files and to write new files, I might share my trial scripts (but I have to warn you I started using Perl just this Monday)

Good luck!