I have 600,000 + unique protein sequences that I want to blast against themselves. For example, I want to run sequence 1 against sequence 1, sequence 2 against sequence 2, ... sequence 3453 against sequence 3453 etc.
I have local blast installed on my computer and blastdb is prepared for the protein sequences. Problem is, each sequence has to be blast against the whole database when I will only retain the hit from a unique sequence against itself. Needless to say, I need a much more efficient method! Any idea?
I'm curios of what kind of results you're expecting, as clearly the best hit in every case will have 100% similarity covering the entire query sequence length.