BLAST: how to print mismatch position?
1
1
Entering edit mode
4.3 years ago
restless.v2 ▴ 30

Hi, in brief i read BLAST manual and i did not find how to print mismatch positions. Does exist a cmd to do that? P.S. I mean exactly mismatch position (like A instead of T) not where query/subject alignment start and stop. for example: there is 1 mismatch at query position 21 --> reference base A / query base T

Thanks David

alignment snp • 4.3k views
ADD COMMENT
0
Entering edit mode

Why do you need that information? I don't think that the BLAST algorithm really focuses on mismatch positions while looking at HSPs - in fact, I feel like identifying those positions and reporting them would make the algorithm significantly slower.

ADD REPLY
0
Entering edit mode

Hi RamRS, in BLAST output with come along with simple graphic of the alignment is represented the mismatch. I am looking a command to put this info in tabular file.

Otherwise you could suggest me another program? Thanks a lot

ADD REPLY
0
Entering edit mode

You're going to need to manually parse BLAST's simple graphic output (which is a pretty crazy thing to do), so I don't think BLAST is ideal. The closest I can think of too your requirement is a CIGAR string for an alignment, so I'd look for tools that do that. Read: https://jef.works/blog/2017/03/28/CIGAR-strings-for-dummies/

ADD REPLY
2
Entering edit mode

Dear RamRS,

your hint was very precious and pushed forward in my endevour into let BLAST works the way I want. I find in BLAST formatting options for output that can be selected a SAM format, then a CIGAR container! Wow!

It works!

This is my cmd line LOOK at -outfmt "17". It stands for SAM (from -help)

blastn -db NoV_varianti_GII4.fasta -query 2017_join.fasta -outfmt "17" -out 2017_result.txt -perc_identity 98 -subject_besthit -max_target_seqs 1 -num_threads 4
ADD REPLY
1
Entering edit mode

Excellent job and great effort! Congratulations on figuring out a solution.

ADD REPLY
1
Entering edit mode

While CIGAR string tells you what kind of change (insertion, match, mismatch, deletion) is at a position, it does not tell you the actual base change. Isn't that what you wanted?

ADD REPLY
0
Entering edit mode

Hi genomax, you are right, but in my pipeline what I am looking for is at first the position of the mismatch. This because in my alignment I have a virus variant reference library (228bp region) with 40 hot spot positions. Some sequence in library are so similar that differ for only 1 base. To understand if I can validate also an alignment with 1 or 2 or 3 (new variant or PCR bias?) mismatch I need to know at first where the mismatch happens. I need to do that because my analysis is on environmental sample (signal very low) and every read I can take home is good in particular for very low variants. In a second time I will make a reasoning about mismatch type and possible analysis like new virus variant detection. Thank you for your feedback. David

ADD REPLY
0
Entering edit mode

If you just need to know the first position of the change then you will certainly get that from CIGAR.

ADD REPLY
0
Entering edit mode

You may be able to do this using biopython's blast parser. Take a look at this page. All the way to the end is a section on HSP's. While this particular example is for blast (old version), the current blast+ parser in python has something analogous.

hsp.bits = 77.0
hsp.expect = 0.16
hsp.frame = ()
hsp.gaps = (5, 76)
hsp.identities = (22, 76)
hsp.match = '+G  K+P       AF V +  E HK    + S+  +E SK+   RWK+++  EK  F   A+  K  +  +   Y'
hsp.num_alignments = 1
hsp.positives = (35, 76)
hsp.query = 'KGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTY'
hsp.query_start = 2
hsp.sbjct = 'EGHVKRPMN-----AFMVWSRGERHKLAQQNPSMQNTEISKQLGCRWKSLTEAEKRPFFQEAQRLKILHREKYPNY'
hsp.sbjct_start = 2
hsp.score = 34.399999999999999
hsp.strand = (None, None)
ADD REPLY
0
Entering edit mode

Hi genomax, thank your for answer. Now I know that if I can't print a result as i want from a program output I can parse what I can get at best. Thank you.

ADD REPLY
1
Entering edit mode
4.3 years ago

I wrote blastn2snp : https://lindenb.github.io/jvarkit/BlastNToSnp.html

$  java -jar dist/blastn2snp.jar < blast.xml | column -t

#query              hit                                                                                              hit-index  hsp-index  query-POS  hit-POS    STRAND  REF(hit)  ALT(query)  blast.align_length  blast.hit.var  blast.query.var  blast.mid.var
No definition line  Homo sapiens chromosome 6, alternate assembly CHM1_1.1                                           1          9          21         74567818   -       T         A           18                  A              T                .
No definition line  Homo sapiens chromosome 6, alternate assembly HuRef                                              2          9          21         71600901   -       T         A           18                  A              T                .
No definition line  Homo sapiens chromosome 6, GRCh37.p13 Primary Assembly                                           3          9          21         74401398   -       T         A           18                  A              T                .
No definition line  Homo sapiens chromosome 5, alternate assembly CHM1_1.1                                           4          1          7          107821121  -       A         G           28                  T              C                .
No definition line  Homo sapiens chromosome 5, alternate assembly CHM1_1.1                                           4          9          16         14262358   +       G         C           18                  G              C                .
ADD COMMENT
0
Entering edit mode

Pierre Lindenbaum : This is not identifying the first position (coordinate) in the reference where there is a change. This is what OP wants.

ADD REPLY

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6