What percent identity is recommended for detecting protein homology within a species?

0

Entering edit mode

3.0 years ago

O.rka ▴ 750

I have species X with established gene models. I have a de novo assembly of a different strain of species X where I performed gene calls. I want to figure out which genes are unique in the strain compared to the reference assembly.

I planned on using MMSEQS2 to cluster both gene sets but I need to pick a good percent identity. What percent identity cutoff should I use?

alignment genomics proteomics • 940 views

ADD COMMENT • link 3.0 years ago by O.rka ▴ 750

0

Entering edit mode

If it is a strain of the same species then should the cutoff not be set as high as possible to start with.

Are you hoping to see unique genes from each strain not cluster (trying to avoid pairwise comparisons)?

ADD REPLY • link 3.0 years ago by GenoMax 152k

0

Entering edit mode

Yes, I would want a unique gene to not cluster. Do you think 95% is too low?

ADD REPLY • link 3.0 years ago by O.rka ▴ 750

Login before adding your answer.