Question

Phylogenetic tree from single copy core genes (metazoa proteomes)

0

Entering edit mode

6.3 years ago

Elizabeth ▴ 30

Dear fellow bioinformaticians, For the past 4 weeks I have been trying to understand and interpret the general protocol required to build a phylogenetic tree using single copy core genes. However, I have not arrived to a single technique to build a tree using single copy core genes or how to extract single copy core genes from the proteome sequences of 6 metazoan organisms that I have. I know how to build a gene tree using specific sequences from different species, however I am unsure about how to detect single copy core genes and extract the 100's of genes from the proteome sequences. Any advice will be deeply appreciated. Thank you.

single copy genes phylogenetic tree • 3.5k views

ADD COMMENT • link 6.3 years ago by Elizabeth ▴ 30

0

Entering edit mode

Are there typing schemes already known for your organism? I.e. single copy core genes that people have already identified?

ADD REPLY • link 6.3 years ago by Joe 21k

0

Entering edit mode

You mean single copy orthologs? If yes, you can use orthofinder tool

https://github.com/davidemms/OrthoFinder

the tool uses proteins of species (proteome files) and find single copy proteins with their gene ids, and does alignment and produce phylogenetic tree.

you can easily get sequences of single copy genes from output of orthofinder.

ADD REPLY • link 6.3 years ago by Mehmet ▴ 820

0

Entering edit mode

Moved my response from comment to answer

ADD REPLY • link 6.3 years ago by Anand Rao ▴ 640

0

Entering edit mode

Thank you everyone.

ADD REPLY • link 6.3 years ago by Elizabeth ▴ 30

score 1 · Answer 1 · 2018-08-13

If I understand your question and problem correctly, what might useful and relevant is something like BUSCO - http://busco.ezlab.org/?

Which in turn uses the orthologs listed at OrthoDB - https://www.orthodb.org/ At that link, you can see there are 330 BUSCOs i.e. orthologs that are "expected" in any complete metazoan genome, since these are conserved genes across that clade.

The nice thing about BUSCO is that it allows you to identify these expected orthologs from either the genome or the proteome (or even from the transcriptome).

-m MODE, --mode MODE Specify which BUSCO analysis mode to run. There are three valid modes: - geno or genome, for genome assemblies (DNA) - tran or transcriptome, for transcriptome assemblies (DNA) - prot or proteins, for annotated gene sets (protein)

Once you identify the expected ortholog in each proteome, go into the BUSCO output folder and either manually or better yet using some simple parsing scripts you can extract sequences orthologous to each BUSCO sequence (1 by 1 for each of the expected 330 orthologs in Metazoa).

Then you align each of these 330 sequence sets. You can then concatenate the 330 alignments, and then run something like online RaXML tool, at the CIPRES portal at PHYLO - http://www.phylo.org/sub_sections/portal/ to obtain the final phylogenetic tree with bootstrap support...

Hopefully this tool (BUSCO) with the accompanying OrthoDB set that is most suited for your species' of interest (Metazon?), should help to start answering your question about ortholog-based phylogeny. Recognize however, that there are several other ways to answer this question...