In pBlast the Expect Value is used to limit the number of scores and alignments for reporting matches against the database sequences.
The default value is assigned 10 (NCBI Document: link text) which means that 10 matches are expected to be found by chance for a specific query based on a stochastic model of Karlin and Altschul (Paper: link text).
Does anyone knows why the default value has been assigned 10? What is the logic behind this value?
Do you know any papers discussing this issue?
I don't believe that there's any logic to it - it's just an arbitrary choice. Anything higher than 1 is unlikely to be a related sequence. Perhaps users are uncomfortable when BLAST returns no matches, so they chose a value likely to return something, even if insignificant, under most scenarios :)
The default E()-value (expect) for proteins for BLAST (and FASTA) reflects the goal of providing the investigator a chance to see the "transition" between related and unrelated sequences as you look down the list. While it is true that unrelated sequences begin to appear around E() < 1.0 (in 1% of searches, they should appear at E() < 0.01), for diverse protein families, there will be many related sequences with E()-values in this range as well. Indeed, for very large and diverse protein families, there will be many more homologs with E() between 1 and 10 than unrelated sequences. By E() ~ 10, however, many more of the scores will be unrelated.
Note that E() ~ 10 makes sense for protein:protein scores, but it makes less sense for translated-DNA:protein searches (BLASTX, FASTX) or DNA:DNA scores (BLASTN, FASTA). In the FASTA programs, the default values are 5.0 for FASTX/FASTY and 2.0 for FASTA/DNA. (BLAST always uses 10.) This reflects the less robust accuracy of those expect values. Because of out-of-frame translations (which can produce low E()-values against low complexity regions) and local DNA composition bias, more scores at E() < 5.0 or E() < 2.0 (DNA:DNA) are likely to be unrelated.
My understanding is this is partly arbitrary, as Neil suggests, part also historical. It has been that way for a long time, at least 20 years. "It's been that way for so long, likely no one really know why."
Added in edit 2 Mar 2012: Given the much better response supplied by Bill Pearson, I should simply delete my response. There is a historical aspect to why a value of 10 is applied, but those who developed the similarity search tools we are still using do know why. As to the arbitrary part of my answer - that is wrong.
I don't believe that there's any logic to it - it's just an arbitrary choice. Anything higher than 1 is unlikely to be a related sequence. Perhaps users are uncomfortable when BLAST returns no matches, so they chose a value likely to return something, even if insignificant, under most scenarios :)