Hi All,
I have a multiple of Mtb strains in the form of scaffolds.
Is there a way I can align these multiple scaffolds of multiple Mtb strains to a reference genome, to get a variant file with all the SNPs?
I know I can align reads to a reference genome and can get the variant file using samtools. But have no idea of aligning scaffolds or contigs to a reference genome to get all the variants with respect to the reference genome.
I think you can consider scaffolds as "Big reads"
Is there such option in bowtie or bwa?
Yes to me you just have to align it to a "classic" manner. Maybe you have to modify a bit the insertion score if you have big insertion due to the variability of your samples.
But the format of reads are different from scaffolds, scaffolds are the string of sequences where as read files looks like this
You just ha to put your scaffold has fasta format , don't you think ?
Thanks a lot it worked for me :) I aligned it using Bwa-mem and scanned it using freebayes
@Titus, However, after the alignment I found that most of the SNPs are in the overlapping regions of the contigs. And these are contigs not scaffolds. Do I need to scaffold them before mapping or continue using contigs for the analysis?
I'm not sure to understand it correctly, what do you mean by in the overlapping regions of the contigs ? Because you need an overlap to compare them :)