I want to compare two bacterial genomes using SNP signature. The method uses one bacterial genome as reference and aligns the reads from other bacterial to the reference genome. After variant calling, I can see some regions from reference genome do not have SNPs with the reads, i.e., the histogram of SNP counts is zero, for example, on position 367390.
To validate the zero SNPs region, I run the coverage analysis using bedtools (genomecov), it looks that this region is not covered by reads, but I am not sure if this analysis is correct. The puzzle to me is that I have run multiple sample reads from different runs, the results are consistent, and the two genomes have similar sequences from sequence alignment. The positions are expected to have reads covered.
(The bam file was from bwa alignment of reference genome and reads using default options)
$ bedtools genomecov -ibam CR36.bam -d > output.txt
Sample entries of output.txt
NODE_1 367376 15
NODE_1 367377 13
NODE_1 367378 12
NODE_1 367379 2
NODE_1 367380 2
NODE_1 367381 2
NODE_1 367382 2
NODE_1 367383 0
NODE_1 367384 0
NODE_1 367385 0
NODE_1 367386 0
NODE_1 367387 0
NODE_1 367388 0
My question is: When we find some regions do not have SNP in VCF, how do we validate if these regions have reads covered during alignment?
Could you please shed some lights on my puzzle? Sorry for the long question. Thank you!
Perhaps the bases in this region are all N due to scaffolding? Alternatively, they may differ from the reference enough so that bwa can't map reads there. You did not specify whether you were using bwa-aln or bwa-mem, or the version, or command, or read length, or whether the reads were paired, or the platform, or average depth, or the library preparation, so it's difficult to speculate about why you have zero coverage in some locations; it's affected by all of those things.
Thank you!
The alignment used BWA with following commend:
The zero SNPs on some regions are possibly due to homologous recombination between the reference genome and the genome by the raw reads. The recombination is what I expected in my experiment and I wish to confirm this recombination by SNP calling. Because different sequencing runs have similar results of SNP calling, and two genomes are similar from alignment test of complete genomes, the possibility of non-coverage of the reads on this region is low. That is my puzzle. Probably the commends in BWA alignment or bedtools coverage analysis of alignment are incorrect?
You can always view the aligned files in IGV to ensure reads if you only have a few specific regions to check. But yes, it appears those regions don't have reads. If you still have sample remaining, you can try Sanger sequencing the region, though again, this is really only reasonable for a handful of regions.
@Brian Bushnell's suggestion that the two genomes might be too different to align properly is also a very real possibility.