I have multi-domain protein dataset, where a lot of the Reciprocal BLAST hits (RBHs) exhibit only local homology, but do not extend to global homology at the full length protein level. To prevent this problem, I want to
a. choose the correct parameters for the BLAST run itself - such as coverage % of subject and/or query... Any advice on how to set them without being arbitrary, and any references that talk about BLAST parameter optimization?
b. Use any scripts or stand alone command line tool to process the RBH output file to increase sensitivity and specificity of RBH detection that is not just local homology based.
I found a lot of questions and answers indirectly related to this, but none referring to post-processing tools to filter out local-homology based RBH results.
Other solutions to circumvent this problem are also welcome, that may use a different approach from what I have outlined above. Thanks!
BLAST is designed for doing local alignment so if that's not what you're interested in, you should maybe use another algorithm e.g. if you're interested in global alignments, you could use the Needleman-Wunsch algorithm. Also the usual way to study multi-domain proteins is by doing multiple sequence alignments using e.g. clustal omega
Thank you, Jean-Karim, for reminding me of the Needleman-Wunsch option. I'd spoke to Bill Pearson about trying to replace BLAST with the FASTA software package (ggsearch tool), which he'd supported as an idea, but I'd forgotten about it. I will try this option.
In my experience, there are no off-the-shelf alignment tools that can accurately align multi-domain proteins, except to some extent - Pfam DIALIGN , because it is a "protein domain-aware" aligner. This is especially useful when protein domain sequences are quite divergent (I work with such domains, conserved at structural level, but not so much as sequence level)
Thanks again!