Entering edit mode
21 months ago
Mark
▴
20
I have 2 long fasta files (~100kb) each. I want to find the differences between them and output it in a nice summary (what kinda of mutations (point, indels, etc) and where).
I’m thinking of using a tool like Needle to perform a global alignment, but is there some sort of tool I could use afterwards to automatically perform the mutation analysis or maybe a tool that just takes both fastas from the start and outputs a mutation analysis summary?
Are these sequences closely related? The reason I am asking this is that if you want to visualize point mutations and short indels over a sequence alignment of length ~100kb, then I think you may not be able to extract much information. If there are blocks of interest within these alignments (~1kb) then you could probably see point mutations and gaps using ClustalX or SeaView. However, if you want to do this for the entire 100kb region then I am not aware of a tool that could do it. When I did this a while ago, I used Gviz which is a package in R enabling visualization of genomic information as tracks. I discretized the sequence into windows and summed over all point mutations per window and assigned an average per window. This way I could identify windows with high rate of point mutations and low rates of point mutations.