Entering edit mode
8.4 years ago
Yongjie Zhang
▴
110
Dear All, I have a DNA alignment of 34 kb and want to get the number of SNPs every 10 bp across the alignment.Do you know how to get what I want? Any comments are welcome. Thanks. Yongjie
Could you provide more info on how your alignment looks like? is it a text file? Did you produce it using Blast? Do you want to just count how many single bases differ between the query and the reference sequence? Or do you also want to contrast if such single bases are SNPs in the reference sequence?
Yes, my alignment is a text file. I aligned using MAFFT and then saved the output in fasta format. I just want to count how many SNP sites there are in every 10 bp across the alignment length. The purpose to do this is to finally find which region is most variable and has the potential to be developed as markers.
Does anyone know or would like to write a script (e.g., a perl script) to help realize my task? The script should be able to read a DNA alignment of X bp in length in fasta format and then summarize the numbers of SNP sites in every Y bp (Y is extremely lower than X) across the alignment. A site is considered as SNP if there are more than two different nucleotide bases (>= 2); a site is not considered as SNP if there is only one kind of base (A, T, C, or G). At such sites, some sequences may have insertion/deletion/gap, but we will only look at remaining sequences with definite bases to determine if it's a SNP site. I can use a perl script, but can not write a new script. Hope to have your great help. Thanks. Yongjie