I made a subset of nr databse using blastdbaliastool. I split the nr database into 70%:30%, and created two smaller databases (say, nr-70 and nr-30). Now I run blastp against both of them using query q and store the two results separately (say result-70, result-30). I use "-max_target_seqs 50" and "-max_hsps 20" options. No explicit e-value cutoff is given.
I run blastp against whole nr using the same query q, say the result is result-100. When I compare result-70 and result-30 against the result-100, I see a strange phenomena. While result-70 and result-30 have 29 and 26 hits, result-100 has only 28 hits. Since nr-70 and nr-30 are non-overlapping and they together constitute nr, it should find 50 hits since 29 + 26 = 55 > 50. Some of the hits found from search against smaller databases don't show up in the search result from the larger database.
Any idea why this is happening?
I don't know if it the cause of your results, but
--max_target_seqs
have a known somewhat unexpected behaviour:What BLAST's max-target-sequences doesn't do
I wonder if database size also interacts these blast heuristcs.