Hi all,
I noticed a large number of subjects are failing the sex-check in Plink for my GWAS dataset. X,Y, XY and MT are encoded according to the Plink convention (23-26) so I don't think this is due to the pseudoautosomal region. I cross checked the sex for a few of the problematic subjects against our records and that seems fine too.
One thought I had was that this dataset includes other variants in addition to SNPs, such as indels. I have not dealt with indels in the past so this is unfammilar territory to me, is it possible that the indels may be causing a problem here ?
If not are there any other sanity checks that people would recommend ?
Thanks very much!
Thanks for the advice, I will plot the data and have a check that now!
How to get the algorithm or statistic method for this --check-sex?
It's based on the same heterozygosity/inbreeding coefficient as --het.
I see. it is X chromosome inbreeding coefficients. What's the perceived average het rate for the SNPs in chrX? 20% ?
This depends on the set of variants in your dataset. E.g. if you have lots of rare MAF < 5% variants, the average female het rate will be far less than 20%.
The F-statistic helps correct for this, but its distribution will still depend on dataset-specific details. However, you can count on male F statistics being clumped close to 1 (assuming you've excluded the pseudoautosomal regions) and female F statistics being scattered closer to 0, and fortunately that's usually enough to get the job done.
Nice. I checked it in my dataset, I have 12702 SNPs in chrX from 8648 european/africa/asian data. I found the het ratio for 99.9% male is <0.1% and only few of man have 0.8% het ratio (I doubt maybe be mistake of pedigree records) . for the female the average het ratio is > 8%. But apply the F-score, but many values between 0.2 and 0.8. I will change the threshold for better classification in my own study.