Hi there,
Should SNPs that have this sort of name 'exm_...." be removed from genetic data at the QC stage.
Not necessarily, they used this ID cause it was part of their ExomeSNP array, probably because there was no RSID at the time, for example this one:
https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ss.cgi?subsnp_id=ss1958317049
Should SNPs with alleles codes such as 0 and A be removed giving one is missing?
Yes, they are probably not SNPs, so it should be safe to remove them.
Should SNPs that have a kgp or JHU prefix be removed? What are they?
Here is how you can convert them to rsIDS:
https://github.com/nhettige/Updating-kgp-IDs-to-rs-IDs-for-SNPs-on-Illumina-HumanOmni2.5M-array
There should be other methods as well, like annotating with DBsnp, I wouldn't worry too much about the IDs as long the quality of the data looks ok.
Should chromosome 26 be removed?
Yes you can remove, here is anther post about them:
QC of genetic data
What about SNPs that have a SNP name such as 1 seq 0 12002028. A. G
Probably SNPs that still don't have an rsid, like in the first example.
Hi Raony,
Thank you for your response. I am doing the sex check part of the QC but I only have 19 variants with MAF>0.05 and the sex check is messed up. Do you have any experience with this/advice on how to proceed? I would really appreciate it. PS: My sample is small (850 people)
Check how many variants you have without filtering for MAF, maybe use a MAF of 0.01? You could try the --impute-sex ycount or y-only. Do you have variants in the non-par region of chrX? What is the heterozigosity on for these 19 variants? What is the heterozigosity on all variants in chromossome X ? Normally chr Y have very few variants on snp arrays but that's usually enough to determine the sex. Try some other tools like peddy or somalier.
[1] https://github.com/brentp/peddy [2] https://github.com/brentp/somalier