Short Sequence BLAST Search with Gaps
5
0
Entering edit mode
9.5 years ago
andytsaica • 0

I am running a BLASTN search on a 15-nucleotide sequence with the following parameters:

expect threshold = 1000, word size = 7, match/mismatch = 1, -3, no filters or masks

I am getting only 100% identity hits. However, I am also interested in hits with mismatches or gaps in the middle of the sequence, which are not showing up in the results even when I raise the expect value or make the match/mismatch penalties less severe.

What parameters should I use to allow results with gaps to show up?

Thanks!

blast alignment • 6.7k views
ADD COMMENT
0
Entering edit mode

You can also try setting gap penalties.

ADD REPLY
0
Entering edit mode

I have tried fooling around with the gap penalties, but there are still only 100% matches. I have a feeling that 15 nucleotides may be too short to find sequences gaps within a reasonable number of hits since there are so many hits. When I do a sequence of ~35 nucleotides, then several gaps start showing up, but this is not the case with ~15 nucleotide sequences.

ADD REPLY
0
Entering edit mode

How did u put the command to get only 100% identity using your followed parameters.

Thanks

ADD REPLY
0
Entering edit mode

I didn't put any command, the results just ended up being on 100% identity matches using the settings that I posted above.

ADD REPLY
0
Entering edit mode
9.5 years ago

After applying gap penalties, you will get even less gaps than before.. Take a look to this image

Problems raise by the fact that 15 nucleotides is too short. You don't mention what database are you using in the comparison, but if it is a whole public database, 1000 hits are nor enough to discover gapped aligment, since 15 bases is small and present everywhere

ADD COMMENT
0
Entering edit mode

I am blasting against the entire nt database. I have tried making gap open penalty 0, and gap extension penalty the lowest it can be, but that still doesn't help.

ADD REPLY
0
Entering edit mode

You can also try BLAT with -out=blast9, it will look very similar like blastn results

ADD REPLY
0
Entering edit mode
9.5 years ago
h.mon 35k

Instead of blast you could try some short read mapper. But the question is, do you really need to blast 15bp sequences against the whole nt database? What are you trying to achieve?

ADD COMMENT
0
Entering edit mode
9.5 years ago

Sequence complexity is small considering 15 bases only. Chances to get more than 1000 hits with a 100% identity and with ungapped aligment when using a whole database is very high

Get more than 1000 hits, and go to the end of the list. From the NCBI BlastN database you can select 20000

ADD COMMENT
0
Entering edit mode
9.5 years ago
thackl ★ 3.0k

I seem to recall that the gapped blastn algorithm requires two non-overlapping exact word matches to trigger a HSP (maybe described in the second Altschul BLAST paper). In your case this probably makes it impossible to find hits with gaps/mismatches.

ADD COMMENT
0
Entering edit mode
9.5 years ago
Michael 55k

You should set -task=blastn-short as parameter in Blast+, alternatively, try word size 3 or 4.

ADD COMMENT

Login before adding your answer.

Traffic: 1724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6