Blastn mismatch sensitivity for HSPs
0
0
Entering edit mode
4.1 years ago
Bwremjoe ▴ 30

Heya,

Is it possible to configure blastn in such a way that it will split even single nucleotide mismatches into two HSPs? For testing, I made the following two fasta files:

>SEQ1_FASTA
CTGCGGATGACTTAACACGCTAGGGACGTGGAGTCGATTCCATCGATGGTTATAAATCAAAGATTCGGAATGCTGTCTGGAGGGTGAATCTAACGGTGCGTATCTCGATTGCTCAGTCGCTTTTCGTACTGCGCGAAAGTTCGTACCGCTCATTCACTTGGTTCCGAAGCCTGTCCTGATATATGAATCCAAACTAGAGCGGGGCTCTTGACATTTGGAGTTGTAAATATCTAATACTCCAATCGGCTTTTACGTGCACCACCGCGGGCGGCTGACGAGGGTCTCACACCGAGAAACAAGACAGTTCCGGGCTGGAAGTAGCGCCGGCTAAGGAAGACGCCTGGTACAGCAGGACTATGAAACCGGTACAAAGGCAACATCCTCACTTGGGTGAACCGAAACGCGGTATCAAGGTTACTTTTTGGATACCTGAAACAAATCCCATGGTAGTCCTTAGACTTGGGAGTCTATCACCCCTAGGGCCCATATCTGGAAATAGACGCCAAGTTCAATCCGTATTCCGACGTACGATGGAACAGTGTGGGTGAGACGTGCTTCATTTATACCCTACGCAGGCTGGACCGAGGTCCGCAAGGCGCGGCGGTGCACAAGCAATTGACAACTAACCACCGTGTATTCATTATGGTACCAGGGACTTTAAGCCGAGTCAATGGAGCTCGCAATACAGAGTTTACCGCATCTTGCCGTAACTGACAAACTGTGATCCACCACAAGTCAAGCCATTGCCTCTTAGACACGCCGTTAGAGTAATTATGTAAACTTTGCGCGGCTTGACTACGACTCGTTCAGTCACGTCCGAGGGCACAATCCTATTCCCATTTGTATGTTCAGCTATCTTCTACCCATCCCCGGAAGTTAAGTAGGTCGTGAGATGCCGAGGAGGCTCTCGTTCATCCCGTGGGACATCAAGCTTCCCCTTGATAAAGCACCCCGCTCGGGTATGGCAGAGAGGACGCCTTCTGAATTGTGCTATCCCTCG
>SEQ2_FASTA
CTGCGGATGACTTAACACGCTAGGGACGTGGAGTCGATTCCATCGATGGTTATAAATCAAAGATTCGGAATGCTGTCTGGAGGGTGAATCTAACGGTGCGTATCTCGATTGCTCAGTCGCTTTTCGTACTGCGCGAAAGTTCGTACCGCTCATTCACTTGGTTCCGAAGCCTGTCCTGATATATGAATCCAAACTAGAGCGGGGCTCTTGACATTTGGAGTTGTAAATATCTAATACTCCAATCGGCTTTTACGTGCACCACCGCGGGCGGCTGACGAGGGTCTCACACCGAGAAACAAGACAGTTCCGGGCTGGAAGTAGCGCCGGCTAAGGAAGACGCCTGGTACAGCAGGACTATGAAACCGGTACAAAGGCAACATCCTCACTTGGGTGAACCGAAACGCGGTATCAAGGTTACTTTTTGGATACCTGAAACAAATCCCATGGTAGTCCTTAGACTTGGGAGTCTATCACCCCTAGGGCCCATATCTGGAAATAGGCGCCAAGTTCAATCCGTATTCCGACGTACGATGGAACAGTGTGGGTGAGACGTGCTTCATTTATACCCTACGCAGGCTGGACCGAGGTCCGCAAGGCGCGGCGGTGCACAAGCAATTGACAACTAACCACCGTGTATTCATTATGGTACCAGGGACTTTAAGCCGAGTCAATGGAGCTCGCAATACAGAGTTTACCGCATCTTGCCGTAACTGACAAACTGTGATCCACCACAAGTCAAGCCATTGCCTCTTAGACACGCCGTTAGAGTAATTATGTAAACTTTGCGCGGCTTGACTACGACTCGTTCAGTCACGTCCGAGGGCACAATCCTATTCCCATTTGTATGTTCAGCTATCTTCTACCCATCCCCGGAAGTTAAGTAGGTCGTGAGATGCCGAGGAGGCTCTCGTTCATCCCGTGGGACATCAAGCTTCCCCTTGATAAAGCACCCCGCTCGGGTATGGCAGAGAGGACGCCTTCTGAATTGTGCTATCCCTCG

The only difference between the two being a single SNP at position 500. I want to use the output of blastn downstream, so I'm hoping to do this with blast. I've been playing with the rewards / penalty for mismatches, but it seems I can't make it that sensitive. Anyone else knows how to do this? Possibly this is not feasible with blast, so if anyone knows some other tool to get HSPs based on small variants, that would be cool too :)

I notided there's an 'ungapped' option, that does precisely this for gaps (even if they are 1 nt). But I did not find the equivalant for a mismatch. Here's the result of ungapped:

blastn -query seq1.fn -subject seq2.fn -outfmt 6 -ungapped
SEQ1_FASTA      SEQ2_FASTA      100.000 500     0       0       501     1000    500     999     0.0     962
SEQ1_FASTA      SEQ2_FASTA      100.000 499     0       0       1       499     1       499     0.0     960

So, I want this, but then for single NT mismatches.

blastn hsp • 1.7k views
ADD COMMENT
0
Entering edit mode

Those are two entries, not two files. Do you have them as one entry per file?

ADD REPLY
0
Entering edit mode

Yes, these are actually two files. I'm running:

blastn -query seq1.fn -subject seq2.fn -outfmt 6
ADD REPLY
0
Entering edit mode

Hi i know that i'm not answering what you asked but a few months ago i did read about snp detection algorithms and i learn that Blastn is a very error prone tool to detect small mutations like snps and smal in/dels. I strongly recommend Dnadiff from the Mummer Tools to do snap calls in large sequences. Dnadiff have a lot of output files that are really parsable.

ADD REPLY
0
Entering edit mode

Thanks for the tip. I'd still prefer to get this directly out of blast though. If it can detect a gap (option ungapped), why shouldn't it be able to detect a SNP?

ADD REPLY
0
Entering edit mode

I don't know how to explain exactly why, because there is some difficult programming that I don't fully understand. But the reason I read about why blast is not the best tool for calling the SNP is because of its scoring functions and alignment methods. They are optimized to provide the best possible local aligniment for a long chain of nucleotides. Therefore, it is possible to find large large mutations accurately, but small single mutations are not well recorded by the algorithm, leading to many false positives. Good results are possible, but you need to optimize the blasting parameters. Even so, I think (this is a personal opinion) that the blast is not the best tool for this sort of analysis.

ADD REPLY
1
Entering edit mode

I agree it's not the best tool. The problem is that I'm trying to show this to people, but in order to do so, I must compare it to other methods in a fair way.

Either way, thanks for the tip. I now at least got my SNPS using DNADIFF.

ADD REPLY

Login before adding your answer.

Traffic: 1087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6