How to force blastn to align full subject
1
0
Entering edit mode
6.9 years ago

My aim is to dispatch query sequences from fasta file using blastn according to 2 subject sequences.

Here, an example of a query sequence :

>M02945:227:000000000-BHPRH:1:2114:12657:28629 1:N:0:1

GAGAGCTCTATTACTGCACTGGAGGGTAGC

Here, my 2 subjects sequences, they are both 30bp long :

>ESR

GAGAGCTCTATTTCTTCGGTGTTACCAGCT

>Cercle

GAGAGCTCTATTACTGCACTGGAGGGAAGC

By look, the sequence M02945:227:000000000-BHPRH:1:2114:12657:28629 1:N:0:1 will go to Cercle.fa file (29 first bases match with 1 mismatch position 27).

And using blastn :

blastn -query test_sequence_for_blast.fa -subject ref.fa -culling_limit 1 -task blastn-short -outfmt 6

M02945:227:000000000-BHPRH:1:2114:12657:28629 ESR 100.00 8 0 0 2 9 9 2 0.008 16.4

M02945:227:000000000-BHPRH:1:2114:12657:28629 Cercle 100.00 26 0 0 1 26 1 26 1e-13 52.0

My problem is, I would like to know how many mismatch do blastn found in the full length subject sequence. So, first try I decrease the mismatch penalty to -1 (because here I just have the 1->26 information).

blastn -query test_sequence_for_blast.fa -subject ref.fa -culling_limit 1 -penalty -1 -task blastn-short -outfmt 6

M02945:227:000000000-BHPRH:1:2114:12657:28629 ESR 100.00 8 0 0 2 9 9 2 0.027 14.3

M02945:227:000000000-BHPRH:1:2114:12657:28629 Cercle 96.67 30 1 0 1 30 1 30 8e-12 46.0

Nice, I got the 1->30 information, ok that's great.

I inserted a second mismatch position 29

>M02945:227:000000000-BHPRH:1:2114:12657:28629 1:N:0:1

GAGAGCTCTATTACTGCACTGGAGGGTACC

blastn -query test_sequence_for_blast.fa -subject ref.fa -culling_limit 1 -penalty -1 -task blastn-short -outfmt 6

M02945:227:000000000-BHPRH:1:2114:12657:28629 ESR 100.00 8 0 0 2 9 9 2 0.008 16.4

M02945:227:000000000-BHPRH:1:2114:12657:28629 Cercle 100.00 26 0 0 1 26 1 26 1e-13 52.0

And again I lost the info from 26->30... Because at the end I want to totally send to trash sequences with more than 2 mismatches

How to force blastn to align 1->30 straigth and yield the number of mismatch ?

I also tried to modify gap open penalty and gap extend penalty but that didn't do the trick...

Thanks for helping !

blastn • 2.8k views
ADD COMMENT
4
Entering edit mode
6.9 years ago

BLAST, by definition, does only LOCAL alignments. Maybe you are looking for a global alignment tool https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Pairwise_alignment

ADD COMMENT
0
Entering edit mode

Thanks for this help, actually my query sequences are 100-150 bp long, I just trimmed the example sequence here for a better understanding. I think i'll substring the 30 first bases of each sequences and use bowtie2 --end-to-end on them.

ADD REPLY
0
Entering edit mode

Why substring the first 30 bases? Bowtie2 or bwa mem should be supporting longer reads

ADD REPLY
0
Entering edit mode

I'm just interested in the first 30 bases

ADD REPLY
0
Entering edit mode

Nevertheless, don’t use a local aligner if you want a full sequence alignment.

You could also look at just calculating Hamming or Levenshtein distances if your sequences are pretty similar and don’t need aligning first

ADD REPLY

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6