Hello,
A lot of articles talk about quality of NGS data and what aligner/variant caller to use when mapping reads to a reference genome.
However, in the non-model world, mostly draft genome are available for analysis, and I was wondering if anyone knew of a paper putting an emphasis on the quality of the reference genome when it comes to SNP calling. Particularly, I am looking for something that shows more false positives when aligning to a 454 assembly rather than an illumina assembly.
I mean it's pretty obvious, the 454 assembly will have a lot of artificial indels especially due to homopolymeric runs, Illumina tends to handle that better. And when you map to those regions with high quality reads (i.e. HiSeq data you gathered later on), you will detect variation in those areas and will call a false SNP.
Any thoughts, references?
Thanks,
Adrian