Phylogenetic tree from single copy core genes (metazoa proteomes)
1
0
Entering edit mode
6.3 years ago
Elizabeth ▴ 30

Dear fellow bioinformaticians, For the past 4 weeks I have been trying to understand and interpret the general protocol required to build a phylogenetic tree using single copy core genes. However, I have not arrived to a single technique to build a tree using single copy core genes or how to extract single copy core genes from the proteome sequences of 6 metazoan organisms that I have. I know how to build a gene tree using specific sequences from different species, however I am unsure about how to detect single copy core genes and extract the 100's of genes from the proteome sequences. Any advice will be deeply appreciated. Thank you.

single copy genes phylogenetic tree • 3.5k views
ADD COMMENT
0
Entering edit mode

Are there typing schemes already known for your organism? I.e. single copy core genes that people have already identified?

ADD REPLY
0
Entering edit mode

You mean single copy orthologs? If yes, you can use orthofinder tool

https://github.com/davidemms/OrthoFinder

the tool uses proteins of species (proteome files) and find single copy proteins with their gene ids, and does alignment and produce phylogenetic tree.

you can easily get sequences of single copy genes from output of orthofinder.

ADD REPLY
0
Entering edit mode

Moved my response from comment to answer

ADD REPLY
0
Entering edit mode

Thank you everyone.

ADD REPLY
1
Entering edit mode
6.3 years ago
Anand Rao ▴ 640

If I understand your question and problem correctly, what might useful and relevant is something like BUSCO - http://busco.ezlab.org/?

Which in turn uses the orthologs listed at OrthoDB - https://www.orthodb.org/ At that link, you can see there are 330 BUSCOs i.e. orthologs that are "expected" in any complete metazoan genome, since these are conserved genes across that clade.

The nice thing about BUSCO is that it allows you to identify these expected orthologs from either the genome or the proteome (or even from the transcriptome).

-m MODE, --mode MODE Specify which BUSCO analysis mode to run. There are three valid modes: - geno or genome, for genome assemblies (DNA) - tran or transcriptome, for transcriptome assemblies (DNA) - prot or proteins, for annotated gene sets (protein)

Once you identify the expected ortholog in each proteome, go into the BUSCO output folder and either manually or better yet using some simple parsing scripts you can extract sequences orthologous to each BUSCO sequence (1 by 1 for each of the expected 330 orthologs in Metazoa).

Then you align each of these 330 sequence sets. You can then concatenate the 330 alignments, and then run something like online RaXML tool, at the CIPRES portal at PHYLO - http://www.phylo.org/sub_sections/portal/ to obtain the final phylogenetic tree with bootstrap support...

Hopefully this tool (BUSCO) with the accompanying OrthoDB set that is most suited for your species' of interest (Metazon?), should help to start answering your question about ortholog-based phylogeny. Recognize however, that there are several other ways to answer this question...

ADD COMMENT

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6