Hi all,
I'm using NGS data to observe how a community of congenic bacteria evolve over time. Basically, I'm starting by inputting several different alleles of one gene into the same medium and seeing how that population changes over time by sequencing the DNA of that gene.
As far as I can tell, all the SNP calling pipelines appear to be more for genome wide discovery than something like what I am doing.
Currently, I am using VarScan and LoFreq to determine the allele frequencies in this community. The agreement between the two pipelines is decent, but as of now VarScan doesn't seem to be able to detect samples that are present in less than 20% of the population (despite what was reported in this paper and all the other programs are reported to be even worse) and I would like to be able to use at least two different programs to verify the presence of these alleles.
Does anyone know of a better low frequency SNP calling pipeline or perhaps a program that allows you to call SNPs based on known alleles (this would actually be the ideal, but I haven't seen anything like this anywhere that I've read)?
Thanks!
P.S. I'm relatively new to NGS work so please bear with me if I've made some egregious oversight here.
What parameters did you use for VarScan?
I used the most basic parameters as I have never used VarScan before:
That's the format I'm using
Weird, it should report variants with > 0.01 frequency. On the other hand, it requires at least 2 variant reads out of at least 8 reads that cover the position so if the coverage is low you might lose it. Another option to check is the --strand-filter which might filter out low count reads even further. Take a look at the parameters: http://varscan.sourceforge.net/using-varscan.html
It should be fairly high coverage, considering I am only sequencing a 505 and 564 bp region and each multiplex is giving me about 1.7M reads, but I will check look into the parameters more