Gene Coordinates For A Gene Family Members From The Mouse Genome
1
1
Entering edit mode
12.7 years ago
Anima Mundi ★ 2.9k

Hello,

how could I extract all the gene coordinates (i.e. in BED format) for a given gene family from the mouse genome? I would like to start from the Ensembl mm9 version.

bed ensembl gene coordinates mouse • 3.7k views
ADD COMMENT
2
Entering edit mode
12.7 years ago

If you are not afraid of a little Perl, you can use the Ensembl API for this. When you say you want all the mouse genes in a given family, do you refer to the Ensembl families or to the Ensembl GeneTrees? Families are clusters of Ensembl and UniProt proteins and GeneTrees are phylogenetic trees build using all Ensembl genes. See http://www.ensembl.org/info/docs/compara/family.html and http://www.ensembl.org/info/docs/compara/homology_method.html for a description of both pipelines.

If you want to get the coordinates for the Ensembl families, these few lines of code would do the work: [?] use Bio::EnsEMBL::Registry;

my $url = 'mysql://anonymous@ensembldb.ensembl.org'; my $gene_stable_id = "ENSMUSG00000056602"; my $species_name = "mus_musculus";

my $reg = "Bio::EnsEMBL::Registry";

$reg->load_registry_from_url($url); my $compara_dba = $reg->get_DBAdaptor("Multi", "compara");

my $genome_db_adaptor = $compara_dba->get_GenomeDBAdaptor(); my $member_adaptor = $compara_dba->get_MemberAdaptor(); my $family_adaptor = $compara_dba->get_FamilyAdaptor();

my $genome_db = $genome_db_adaptor->fetch_by_registry_name($species_name);

my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", $gene_stable_id); my $families = $family_adaptor->fetch_all_by_Member($member);

foreach my $family (@$families) { foreach my $member (@{$family->get_all_Members}) { next if ($member->source_name ne "ENSEMBLGENE"); next if ($member->genome_db ne $genome_db); print join("t", 'chr'.$member->chr_name, ($member->chr_start-1), $member->chr_end, $member->stable_id, ".", $member->chr_strand==1?"+":"-"), "n"; } } [?]

If you want the genes from the Ensembl GeneTrees, use this bit of code instead: [?] [...] my $genome_db = $genome_db_adaptor->fetch_by_registry_name($species_name);

my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", $gene_stable_id);

my $gene_tree = $protein_tree_adaptor->fetch_by_Member_root_id($member); foreach my $leaf (@{$gene_tree->get_all_leaves}) { next if (!$leaf->genome_db_id or $leaf->genome_db ne $genome_db); my $member = $leaf->gene_member; print join("t", 'chr'.$member->chr_name, ($member->chr_start-1), $member->chr_end, $member->stable_id, ".", $member->chr_strand==1?"+":"-"), "n"; } [?]

To install the Ensembl Perl API, follow the instructions at http://www.ensembl.org/info/docs/api/api_installation.html

You can use external identifiers or names. Note that the method (fetch_all_by_external_name) can potentially return more than one gene as there is no guarantee that the name is unique

[?]

ADD COMMENT
1
Entering edit mode

Hello, thanks for the precious and detailed answer. What I cannot understand (probably it is a silly point, as I am a beginner in scripting), is how could I get the IDs of the families I am interested in. For example, where could I retrieve the ID for the UBF/HMG family?

ADD REPLY
0
Entering edit mode

You can use external identifiers or names. Note that the method (fetchallbyexternalname) can potentially return more than one gene as there is no guarantee that the name is unique

my $geneadaptor = $reg->getadaptor("mouse", "core", "Gene"); my $genes = $geneadaptor->fetchallbyexternal_name("Ubtf");

foreach my $thisgene (@$genes) { my $member = $memberadaptor->fetchbysourcestableid("ENSEMBLGENE", $thisgene->stableid); [...] }

ADD REPLY
0
Entering edit mode

You can use external identifiers or names. Note that the method (fetchallbyexternalname) can potentially return more than one gene as there is no guarantee that the name is unique

my $gene_adaptor = $reg->get_adaptor("mouse", "core", "Gene");
my $genes = $gene_adaptor->fetch_all_by_external_name("Ubtf");

foreach my $this_gene (@$genes) {
  my $member = $member_adaptor->fetch_by_source_stable_id(
        "ENSEMBLGENE", $this_gene->stable_id);
  [...]
}
ADD REPLY
0
Entering edit mode

I have edited the answer to show how to get an Ensembl stable ID from an external name or identifier.

ADD REPLY
0
Entering edit mode

Perfect, thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 1597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6