Hi everyone!
I have a very basic question but I don't seem to find the answer anywhere. I'm doing an SNP analysis using snippy followed by snippy core. I would like to report the amount of SNPs that were used to build the final SNP tree. While checking other papers doing these kind of things, the authors normally report 300-2000 SNPs but when I check my output (core alignment) I'm having 149.410 columns which I interpret to be SNPs, this seems a little excessive in comparison, on the other hand, to be fair, I'm analysing 342 isolates. Am I looking at the right place or should I be looking elsewhere?
Thanks a lot!
what have you tried so far to filter low qual SNPs from high quality?
That's a good question and probably answer at the same time, I haven't filtered any SNP because the VCF file produced shows a PASS for every called SNP. Should I filter all the same? If so do you have a suggestion of a tool/parameter I should check?
Thanks!
Well, reading a little into snippy log (Documentation is not very clear), there is a filtering process
Checking the individual SNP folders generated by snippy, it seems that there is an actual filtering going on:
Two .vcf files are generated, one called snps.raw.vcf and another one called snp.vcf, the first one has much more positions called and a lot of them with quality scores of 0 or nE-11 while the latter has those SNPs from the raw file with quality scores starting from 230.
I'm still confused about the high number of final core SNPs. I've tried contacting Snippy's author but without luck, documentation and tutorials in internet are not very helpful either :(