Hello,
how could I extract all the gene coordinates (i.e. in BED format) for a given gene family from the mouse genome? I would like to start from the Ensembl mm9 version.
Hello,
how could I extract all the gene coordinates (i.e. in BED format) for a given gene family from the mouse genome? I would like to start from the Ensembl mm9 version.
If you are not afraid of a little Perl, you can use the Ensembl API for this. When you say you want all the mouse genes in a given family, do you refer to the Ensembl families or to the Ensembl GeneTrees? Families are clusters of Ensembl and UniProt proteins and GeneTrees are phylogenetic trees build using all Ensembl genes. See http://www.ensembl.org/info/docs/compara/family.html and http://www.ensembl.org/info/docs/compara/homology_method.html for a description of both pipelines.
If you want to get the coordinates for the Ensembl families, these few lines of code would do the work: [?] use Bio::EnsEMBL::Registry;
my $url = 'mysql://anonymous@ensembldb.ensembl.org'; my $gene_stable_id = "ENSMUSG00000056602"; my $species_name = "mus_musculus";
my $reg = "Bio::EnsEMBL::Registry";
$reg->load_registry_from_url($url); my $compara_dba = $reg->get_DBAdaptor("Multi", "compara");
my $genome_db_adaptor = $compara_dba->get_GenomeDBAdaptor(); my $member_adaptor = $compara_dba->get_MemberAdaptor(); my $family_adaptor = $compara_dba->get_FamilyAdaptor();
my $genome_db = $genome_db_adaptor->fetch_by_registry_name($species_name);
my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", $gene_stable_id); my $families = $family_adaptor->fetch_all_by_Member($member);
foreach my $family (@$families) { foreach my $member (@{$family->get_all_Members}) { next if ($member->source_name ne "ENSEMBLGENE"); next if ($member->genome_db ne $genome_db); print join("t", 'chr'.$member->chr_name, ($member->chr_start-1), $member->chr_end, $member->stable_id, ".", $member->chr_strand==1?"+":"-"), "n"; } } [?]
If you want the genes from the Ensembl GeneTrees, use this bit of code instead: [?] [...] my $genome_db = $genome_db_adaptor->fetch_by_registry_name($species_name);
my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", $gene_stable_id);
my $gene_tree = $protein_tree_adaptor->fetch_by_Member_root_id($member); foreach my $leaf (@{$gene_tree->get_all_leaves}) { next if (!$leaf->genome_db_id or $leaf->genome_db ne $genome_db); my $member = $leaf->gene_member; print join("t", 'chr'.$member->chr_name, ($member->chr_start-1), $member->chr_end, $member->stable_id, ".", $member->chr_strand==1?"+":"-"), "n"; } [?]
To install the Ensembl Perl API, follow the instructions at http://www.ensembl.org/info/docs/api/api_installation.html
You can use external identifiers or names. Note that the method (fetch_all_by_external_name) can potentially return more than one gene as there is no guarantee that the name is unique
[?]
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello, thanks for the precious and detailed answer. What I cannot understand (probably it is a silly point, as I am a beginner in scripting), is how could I get the IDs of the families I am interested in. For example, where could I retrieve the ID for the UBF/HMG family?
You can use external identifiers or names. Note that the method (fetchallbyexternalname) can potentially return more than one gene as there is no guarantee that the name is unique
my $geneadaptor = $reg->getadaptor("mouse", "core", "Gene"); my $genes = $geneadaptor->fetchallbyexternal_name("Ubtf");
foreach my $thisgene (@$genes) { my $member = $memberadaptor->fetchbysourcestableid("ENSEMBLGENE", $thisgene->stableid); [...] }
You can use external identifiers or names. Note that the method (fetchallbyexternalname) can potentially return more than one gene as there is no guarantee that the name is unique
I have edited the answer to show how to get an Ensembl stable ID from an external name or identifier.
Perfect, thanks again.