I blasted locally a series of short dna queries - all in one fasta file - against portions of ngs reads from the same set: 10 reads, then 100, 1000 and finally 10k reads. Each read portion included all the smaller ones. But the number of hits that the blast report contained seems to be reaching a plateau rather than increase more or less proportionally. I also noticed that that the reads with hits for a given query don't accumulate as the number of reads blasted increases: reads with hits found when blasting 1000 reads are generally different from those found when blasting against 10k reads instead of the former always being included in the latter. Max_hsps option doesn't have any effect, max_alignments doesn't change much either. In contrast, blat seems to behave the 'correct' way. Is blast adjusting some parameters to increase speed as the size of dna database increases? If I understand it correctly, there's a comp_based_stats option for protein alignment (blastp) to control just that but not for blastn which I'm using. How can I make blastn report all the alignments that fit the criteria with no adjustments for dna db size? Must be something simple I'm missing here...
Were you using
-task blastn-short
option?No, parameters had blastn default values
For 100, 1000, 10k and appr. 230k (entire set) reads I had the following results/number of hits:
Every read should yield at least 1 hit, most of them more than one. It's the same set of chimeric reads as in my previous questions. At first I thought that there are errors in my perl parser that counts alignment hits, but the sheer report file size and number of lines confirms that the figures are correct. I used very relaxed parameters for blat because queries were short (around 20 bases).
Let me ask this. Are you looking to see how much redundancy there is in these sequences? There are other options for that than blast.
No, it started out as an attempt to trim primers in our amplicon library reads (and split them if reads are chimeric after adapter ligation, which most reads seem to be). (Finding all possible alignments of two sequences)
Trimming primers may be best done using a scan/trim program.
bbduk.sh
from BBMap can rapidly find reads that match arbitrary sequences. Guide here. Could be used to identify chimeric reads quickly.