Extracting orthologues from MSA
0
0
Entering edit mode
1 day ago
Christos • 0

Hi everyone,

I am new to sequence analysis and was wondering if anyone could provide some insights. I have a multiple sequence alignment of around 10000 protein sequences which includes orthologues of 9 family members of a protein family. It also includes some more distant members of the superfamily they belong to that I would like to remove.

I am currently creating a phylogenetic tree using Biopython (which is taking quite long) and was thinking of pruning the tree to remove evolutionarily more distant proteins. Is there any other way you would recommend for doing this? Is there a "correct" way to set a threshold to remove these sequences?

Thank you!

MSA • 152 views
ADD COMMENT
0
Entering edit mode

Have you tried feeding the sequences into a orthology tool like OrthoFinder)? It sounds to me like this may solve a few of your issues, whilst also creating gene trees for you.

ADD REPLY
0
Entering edit mode

Sorry, I am quite new to this but I looked into OrthoFinder and I need to split the sequences by species. I have a large MSA (10000) of different species and different proteins in each species so the separation is quite difficult. What I would like to do is remove the evolutionary most distant sequences from 9 reference sequences but I am not sure how.

ADD REPLY
0
Entering edit mode

You don't have to split by species, you just won't be able to use some of the inferences. If you split the fasta into 2 equal chunks, the tool still does an all-vs-all orthology assignment. So the gene trees are still useable for your use case. Just don't look at the species level inferences.

The alternative is making a phylogeny to identify which are evolutionarily distinct, which will be computationally demanding with 10k sequences.

ADD REPLY

Login before adding your answer.

Traffic: 1283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6