I am trying to cluster orthologs (and paralogs) at the protein level. I seem to be getting groups that have very disparate proteins, because they are of very different lengths and their alignments returned by MAFFT are extremely gappy. So I am considering playing around with the Inflation factor of MCL.
Some info about that is at http://micans.org/mcl/man/mcl.html. "A good set of starting values is 1.4, 2, 4, and 6." While I understand, in theory, the effect changing inflation factor will have on the coarseness of clustering, how can I practically determine the best inflation factor for my dataset if I do not have any extensive information on it a priori? Any thoughts? Thank you!
You may set more stringent blast thresholds as well.
OrthoMCL uses an inflation of around 1.5 to balance sensitivity and selectivity based on grouping of enzymes and their E.C. numbers.
Hi Anand, Could you find a solid method to identify the best inflation rate for your MCL clustering? I used BMGE for trimming and it somehow made the MSA file better and removed many gaps. But still, I'm missing many sequences within my orthogroups.