Entering edit mode
7.8 years ago
cool_abbecker
•
0
Say I want to force BLAST to align the first few (~30) very variable base pairs (rest of sequence:>95% identitiy, start of sequence ~35% identitiy among database entries) of my query sequence to all potential hits instead of cutting them off, which of the blastn command line options do I have to change? (Yes, I know L is for local).
BLAST is an algorithm. IT can be implemented in different programming languages, be offered on websites as service or done manually if you are bored in heaven. Therefore, if you have a specific question about one specific implementation, it helps a lot with program/website/whatever you talk about - and equally, if one program gives your problems, there are alternatives! There is not one "BLAST".
I just realised after asking that the question is very unprecise so I added some information: I am using: BLAST+ blastn command line.
How many sequences are there and how long are they? Could you perhaps do a multiple sequence alignment instead, if the reference is known?
I have 276000 sequences with highly complex indel patterns, All are the same gene. length=220 +- 20 The first 30 base pairs of each sequence are more variable than the rest and are very important in identification of similar species but BLAST cuts them off because they are seperated by random junk from the rest of the sequence. I want to comare BLAST to global MSA, so I am doing both.
You may want to take a look at
clumpify.sh
from BBMap suite. It sounds to me like these sequences would form clumps that you could work with easily. See the second post in this thread for inspiration: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. Note: Clumpify will work with fasta files.