Entering edit mode
2.4 years ago
O.rka
▴
740
I have species X with established gene models. I have a de novo assembly of a different strain of species X where I performed gene calls. I want to figure out which genes are unique in the strain compared to the reference assembly.
I planned on using MMSEQS2 to cluster both gene sets but I need to pick a good percent identity. What percent identity cutoff should I use?
If it is a strain of the same species then should the cutoff not be set as high as possible to start with.
Are you hoping to see unique genes from each strain not cluster (trying to avoid pairwise comparisons)?
Yes, I would want a unique gene to not cluster. Do you think 95% is too low?