Hi, I'm searching for e.g. 50 sequences in Not redudundat blast database. I want to test program for protein mutation prediction - program tries to estimate if mutation is deleterious or neutral.
Example of analyzed sequence is well known lacI repressor. Blast finds lot of sequences but too much similar. First 50 sequences are almost the same and prediction program has no heterogentity for it's prediction model.
How to find homogous sequences but not the same (I want orthologs). E. g. sequences from another species and little bit different than human LacI protein.
I tried classic blastp. Another way I tried: first run blastp for 2000 sequences and then align these sequences and this alignment get to psiblast as PSSM (-in_msa parameter). Is there other automatic way or parameter settings for Blast+ package to find more distant sequences?
EDIT: Constraint - searching process have to be automatic. It is one of the component of a bigger tool.
I would guess you need to define some sort of constraints - i.e. (1) bitscore thresholds, (2) species subset (or a distance) and (3) conserved domain(s), and then see which blast hits will satisfy these.