Entering edit mode
2.0 years ago
Junior
•
0
I have protein sequences in five fasta files from species A, B, C, D and E. I want to select 500 genes from species A that meet the following criterion: have five-way best BLAST hits between species A, B, C, D and E;
I have found some scripts to handle one-way; however, I have been trying to get information for multiple-way.
I will appreciate any assistance with any guidelines, program and scripts suggestions.
Thank you in advance.
This sounds quite specific, so you'll probably need to write your own scripts to parse BLAST outputs. If you can't code, then try to get help from a bioinformatician. You may also want to look into sequence clustering methods like CD-HIT (very simple) or OrthoFinder2 (more sophisticated).
Perhaps I should mention that I only need helps for the first criterion. I will edit my post.
Do you have to use reciprocal best blast hits for this?
liorglic's suggestion to use something like OrthoFinder is likely to give you a better result than blast alone.
Hey, liorglic and Dave.
Yes, I have considered using OrthoFinder2.
However, from what I have understood, it identifies orthologs in a pairwise manner (i.e specie1_vs_species2).
So if I provide three species (1, 2 and 3), I will not be able to get a 3-way identified orthologs.
Or is there an option in OrthoFinder to get a multiple-way identified orthologs?
Yes, you can use Orthofinder for multiple species/samples.
You misunderstood. OF2 will produce clusters for multiple species, not just pairwise orthologs (this is the whole idea of clustering...)