BLAST masks repetitive sequences - but where: database or query sequence or both?
1
1
Entering edit mode
9.1 years ago
IsmailM ▴ 110

I understand that BLAST+ masks repetitive sequences. However, I cannot find out if the repetitive sequences are masked in the database or the query sequence or both? [reference would be most appreciated].

Secondly, would BLAST mask a sequence like the following:

QFSAGKRQFSAGKRQFSAGKRQFSAGKRQFSAGKRQWLGGEEEYDPEENLNMETRQFSAGKRQFSAGKRQFSAGKRDWEEELTPEELMDMFQAPETRQFSAGKRQFSAGKRQFSAGKRQWVGGEEEYDPEEMLNMATRQFSAGKRQFSAGKRQFSAGKRQWVGGEEAFLPEMDTRQFSAGKRQFSAGKRQFSAGKRQFSAGKR

where QFSAGKR is repeated 18 times...

BTW the above sequence is thyrotropin releasing hormone where the functional bioactive neuropeptide can be as small as only three amino acids long which is repeated multiple times.

Also, would be nice, if someone could clarify what masks means - i.e. does this mean BLAST totally ignore this sequence (i.e. replace with N/X) or simply give such sequences a lower e-value...

blast • 4.9k views
ADD COMMENT
0
Entering edit mode

In fact you should be able to find this information on the standalone blast manual page. Masking could be soft/lowercase so preserving sequence or hard - replacing nucleotides with Ns or aminoacids with Xs. Algorithm is presented friendly in "BLAST" by Korf, Yandell and Bedell, however it is quite old book.

ADD REPLY
3
Entering edit mode
9.1 years ago
h.mon 35k

Blast may mask both query and database:

-dust for masking query

and -db_soft_mask for masking the database

It is very easy for you to check if blast would mask thyrotropin, just go to Blast page and play with it. If you go to the bottom of the page, clicking on "+Algorithm parameters" will expand several option, including masking ones.

ADD COMMENT

Login before adding your answer.

Traffic: 2611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6