Hi
I have used the Harvest Tools package to align the core genome of ~50 bacteria, with the intention inferring their phylogeny based on SNPs. What criteria and software would be best to filter out any low quality SNPs?
Thanks
Hi
I have used the Harvest Tools package to align the core genome of ~50 bacteria, with the intention inferring their phylogeny based on SNPs. What criteria and software would be best to filter out any low quality SNPs?
Thanks
If you are aligning to core genomes, apply standard variant filters based on your dataset (quality, min/max depth, QD, etc.). Standard tools include vcflib, vcftools, the GATK, etc. - there are many.
If you are aligning just core genomes, presumably all sites included are already confident. You don't need to necessarily filter SNPs, but you can identify them by looking for variable positions. But, if you have all of this information, why not make a tree using all the information you have? You will be able to more appropriately model among-site rate heterogeneity.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If am not wrong there is Parsnp in Harvest components that will allow you to go for SNP filtration right? Did not you try that? In any case if you have vcf file as output from harvest after post the processing step then you can always filter with
vcftools
orvcflib
for low quality variants.