Hello,
I have a large SNP data, I'm trying to remove the SNPs with minor allele frequency (MAF) < 5% and the ones that don't follow Hardy-weinberg equilibrium . I'm using R and I don't know which package does that any help please
Hello,
I have a large SNP data, I'm trying to remove the SNPs with minor allele frequency (MAF) < 5% and the ones that don't follow Hardy-weinberg equilibrium . I'm using R and I don't know which package does that any help please
Hi MMS,
You could try https://cran.r-project.org/web/packages/vcfR/index.html although without meaning to sound intentionally vain, I think my graphs are better :) https://f1000research.com/articles/5-2644/v3 with code available at https://zenodo.org/record/159272#.WKCKsBAnp7E. To visualise the qc using R you can use GATK variantsToTable function to make a readable table. https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantutils_VariantsToTable.php
GATK also has a Hard-Weinberg calculator but I'm not sure about filtering variants directly https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_annotator_HardyWeinberg.php
If you don't use GATK, then programs like vcfTools and bcfTools could probably help, otherwise you have you write your own Perl/Bash/Python/whatever scripts. Plink 1.9 is good too https://www.cog-genomics.org/plink2. You have to convert your vcf into just plink format genotypes, then it's easy to filter by MAF, and HWE.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What is the input data look like? If gwas then try: https://bioconductor.org/packages/release/bioc/manuals/GWASTools/man/GWASTools.pdf