Entering edit mode
4.3 years ago
restless.v2
▴
30
Hi, in brief i read BLAST manual and i did not find how to print mismatch positions. Does exist a cmd to do that? P.S. I mean exactly mismatch position (like A instead of T) not where query/subject alignment start and stop. for example: there is 1 mismatch at query position 21 --> reference base A / query base T
Thanks David
Why do you need that information? I don't think that the BLAST algorithm really focuses on mismatch positions while looking at HSPs - in fact, I feel like identifying those positions and reporting them would make the algorithm significantly slower.
Hi RamRS, in BLAST output with come along with simple graphic of the alignment is represented the mismatch. I am looking a command to put this info in tabular file.
Otherwise you could suggest me another program? Thanks a lot
You're going to need to manually parse BLAST's simple graphic output (which is a pretty crazy thing to do), so I don't think BLAST is ideal. The closest I can think of too your requirement is a CIGAR string for an alignment, so I'd look for tools that do that. Read: https://jef.works/blog/2017/03/28/CIGAR-strings-for-dummies/
Dear RamRS,
your hint was very precious and pushed forward in my endevour into let BLAST works the way I want. I find in BLAST formatting options for output that can be selected a SAM format, then a CIGAR container! Wow!
It works!
This is my cmd line LOOK at
-outfmt "17"
. It stands for SAM (from-help
)Excellent job and great effort! Congratulations on figuring out a solution.
While CIGAR string tells you what kind of change (insertion, match, mismatch, deletion) is at a position, it does not tell you the actual base change. Isn't that what you wanted?
Hi genomax, you are right, but in my pipeline what I am looking for is at first the position of the mismatch. This because in my alignment I have a virus variant reference library (228bp region) with 40 hot spot positions. Some sequence in library are so similar that differ for only 1 base. To understand if I can validate also an alignment with 1 or 2 or 3 (new variant or PCR bias?) mismatch I need to know at first where the mismatch happens. I need to do that because my analysis is on environmental sample (signal very low) and every read I can take home is good in particular for very low variants. In a second time I will make a reasoning about mismatch type and possible analysis like new virus variant detection. Thank you for your feedback. David
If you just need to know the first position of the change then you will certainly get that from CIGAR.
You may be able to do this using biopython's blast parser. Take a look at this page. All the way to the end is a section on HSP's. While this particular example is for
blast
(old version), the currentblast+
parser in python has something analogous.Hi genomax, thank your for answer. Now I know that if I can't print a result as i want from a program output I can parse what I can get at best. Thank you.