I understand that BLAST+ masks repetitive sequences. However, I cannot find out if the repetitive sequences are masked in the database or the query sequence or both? [reference would be most appreciated].
Secondly, would BLAST mask a sequence like the following:
QFSAGKRQFSAGKRQFSAGKRQFSAGKRQFSAGKRQWLGGEEEYDPEENLNMETRQFSAGKRQFSAGKRQFSAGKRDWEEELTPEELMDMFQAPETRQFSAGKRQFSAGKRQFSAGKRQWVGGEEEYDPEEMLNMATRQFSAGKRQFSAGKRQFSAGKRQWVGGEEAFLPEMDTRQFSAGKRQFSAGKRQFSAGKRQFSAGKR
where QFSAGKR
is repeated 18 times...
BTW the above sequence is thyrotropin releasing hormone where the functional bioactive neuropeptide can be as small as only three amino acids long which is repeated multiple times.
Also, would be nice, if someone could clarify what masks means - i.e. does this mean BLAST totally ignore this sequence (i.e. replace with N/X) or simply give such sequences a lower e-value...
In fact you should be able to find this information on the standalone blast manual page. Masking could be soft/lowercase so preserving sequence or hard - replacing nucleotides with Ns or aminoacids with Xs. Algorithm is presented friendly in "BLAST" by Korf, Yandell and Bedell, however it is quite old book.