Phylogenetic tree from whole genome using BUSCO
2
0
Entering edit mode
6 weeks ago

I was trying to construct a phylogenetic tree for related 80 fish based on their whole genome sequences. I have used BUSCO to identify the single-copy orthologs and the common single-copy orthologs from them. then I extracted sequences of these common single-copy orthologs from their Genomes. However, the problem is that for a single id, all the sequences from these organisms much more diverged from each other than expected. can any one say me the cause and how the problem can be solved? and if any one have any other idea through which a whole genome phylogenetic tree can be constructed kindly let me know.

BUSCO whole_genome phylogenetic_tree • 846 views
ADD COMMENT
1
Entering edit mode

However, the problem is that for a single id, all the sequences from these organisms much more diverged from each other than expected. can any one say me the cause and how the problem can be solved?

You can't undo/obscure the evolutionary signal in your data. It sounds like your assumptions about evolutionary distance between organisms may be wrong, or your data has a lot of spurious variation.

mtDNA is often used for phylogenetics given it's slower rate of accumulating mutations.

ADD REPLY
0
Entering edit mode

I should proceed with protein sequences. I have the mitochondrial sequences also I will use them as well. thanks for the suggestion.

ADD REPLY
0
Entering edit mode

making a concatenated alignment of all your common single copy genes might be worth considering?

ADD REPLY
0
Entering edit mode

i am trying to do that only. but when I extracted all (nucleotide sequence ) CDs sequences of the single copy genes and merged them for single genes for all organism, I found they are diverged from each other and have stop codon in them.

ADD REPLY
1
Entering edit mode

Can you elaborate on the that stop codon issue you mention?

If annotated genes include stopcodons it's because the gene(ome) annotation was wrongly done.

ADD REPLY
0
Entering edit mode

Is there a reason you are not using the protein sequences identified by BUSCO?

ADD REPLY
1
Entering edit mode

I was unaware of fitting a time tree with protein sequences, so I considered nucleotide sequences but now I think I should proceed with protein sequences only. Thank You for your response.

ADD REPLY
0
Entering edit mode

Why do you think that being "much more diverged from each other than expected" is a problem?
Also, are you sure that you took sequences in the proper orientation? Sometimes sequences seem more divergent than they actually are if one of the sequences is taken reverse-complement compared to what it should be.

ADD REPLY
1
Entering edit mode
6 weeks ago
Mensur Dlakic ★ 29k

It is not clear to me that you are doing things correctly, so I will say this just in case. First you align the sequences, trim them, then concatenate. Results are likely to be unpredictable if you concatenate before doing the alignments, or if you don't trim the sequences before concatenation.

Protein sequences are likely to be more useful in concatenated tree construction that nucleotides.

ADD COMMENT
0
Entering edit mode

Thanks for your answer i will proceed with the Protein sequences instead of the nucleotide sequences.

ADD REPLY
0
Entering edit mode
6 weeks ago

At least initially, I would create an automated tree from raw reads or raw assemblies using a user friendly tool like ASTER. https://github.com/chaoszhang/ASTER/blob/master/tutorial/waster-site.md

I have had very few problems analyzing (intra-species) assemblies or read sets.

You would need to go into more depth - waster, minigraph-cactus, caster etc - if analyzing genomes across multiple species. I think it is a worthwhile first step for you though to gain an impression of the potential tree phylogenies.

ADD COMMENT
0
Entering edit mode

thanks i will try this also

ADD REPLY

Login before adding your answer.

Traffic: 2578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6