I have a large dataset whose autosomes I was able to successfully phase and impute using TOPMed. I have tried doing the same with the X chromosome but keep running into issues.
Before trying to impute with TOPMed, I did per-individual QC and per-marker QC, then ran checkVCF, and corrected any issues identified with checkVCF such as fixing allele flips, removing duplicate sites, etc. – I performed this on the complete dataset including the X chromosome.
However, when trying to impute the X chromosome it doesn’t get past TOPMed Quality Control. I did find an old post about someone have issues with their X chromosome when using the Michigan imputation server.
They specifically got an error that there were heterozygous variants in their males. Thus, it was suggested that they correct for these heterozygous haploid errors using the following PLINK command:
plink --bfile Input_data --set-hh-missing --recode vcf --out Result_filename
I did this with my files, and I can now impute with TOPMed.
However, I am concerned because when I use PLINK's --sample-counts
, I see that all my individuals are assigned an "ambiguous" sex even though I know I have both male and female indivdiuals. So does this mean I am losing heterozygous variant information on my female individuals?
It will also deal with ambiguity in sex :)
Hi nLeone, You have highlighted a lot of points here regarding the X chromosome. At some point could you write a detailed blog post regarding the same? With examples and how one would do QC on sex if their main intention is analysis on autosomes only. Your experience would really help the community and especially beginners. Thanks, I did get a few pointers from here myself.