I am trying to retrieve gene paralogs and members of gene families given a certain input gene. For example, If my input is TMEM110 I would like all the TMEM genes that are paralogues and members of its gene family. (e.g. TMEM*)
Currently, the script below will return the gene family for all species rather than just human. When I change 'Multi' to Human the script breaks. Also it outputs protein IDs, but I would like it to return ENSEMBL gene ids like the input (ENSG00000139618).
Any help would be appreciated!
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
## Load the registry automatically
my $reg = "Bio::EnsEMBL::Registry";
$reg->load_registry_from_url('mysql://anonymous@ensembldb.ensembl.org');
## Get the compara genemember adaptor
my $gene_member_adaptor = $reg->get_adaptor("Multi", "compara", "GeneMember");
## Get the compara family adaptor
my $family_adaptor = $reg->get_adaptor("Multi", "compara", "Family");
## Get the compara member
my $gene_member = $gene_member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", "ENSG00000139618");
## Get all the families
my $all_families = $family_adaptor->fetch_all_by_Member($gene_member);
## For each family
foreach my $this_family (@{$all_families}) {
print $this_family->description(), " (description score = ", $this_family->description_score(), ")\n";
## print the members in this family
my $all_members = $this_family->get_all_Members();
foreach my $this_member (@{$all_members}) {
print $this_member->source_name(), " ", $this_member->stable_id(), " (", $this_member->taxon()->name(), ")\n";
}
print "\n";
}
Is there a way of getting similarly named genes? For example, TMEM110 has 1 paralogue but there are many members of TMEM. Whilst I can use regular expressions for this example, for my actual application I don't want to have regular expressions for all genes i.e:
COL1A1 -> COL
KLF11 > KLF*
etc. Depending on what the user inputs into the script (which will be many genes one after another)