gender assignment sex-check in Plink
2
6
Entering edit mode
8.5 years ago
avari ▴ 110

Hi all,

I noticed a large number of subjects are failing the sex-check in Plink for my GWAS dataset. X,Y, XY and MT are encoded according to the Plink convention (23-26) so I don't think this is due to the pseudoautosomal region. I cross checked the sex for a few of the problematic subjects against our records and that seems fine too.

One thought I had was that this dataset includes other variants in addition to SNPs, such as indels. I have not dealt with indels in the past so this is unfammilar territory to me, is it possible that the indels may be causing a problem here ?

If not are there any other sanity checks that people would recommend ?

Thanks very much!

Plink allosomes SNPs • 9.0k views
ADD COMMENT
5
Entering edit mode
8.5 years ago

The default 0.2/0.8 F-statistic thresholds applied by plink --check-sex are usually too stringent. You should look at the distribution of F values in your dataset; as long as there's an obvious tight clump on the right side and a more spread-out clump centered near zero on the left side, your data is fine. If so, you can eyeball the plot and choose your own --check-sex thresholds.

ADD COMMENT
1
Entering edit mode

Thanks for the advice, I will plot the data and have a check that now!

ADD REPLY
0
Entering edit mode

How to get the algorithm or statistic method for this --check-sex?

ADD REPLY
1
Entering edit mode

It's based on the same heterozygosity/inbreeding coefficient as --het.

ADD REPLY
0
Entering edit mode

I see. it is X chromosome inbreeding coefficients. What's the perceived average het rate for the SNPs in chrX? 20% ?

ADD REPLY
1
Entering edit mode

This depends on the set of variants in your dataset. E.g. if you have lots of rare MAF < 5% variants, the average female het rate will be far less than 20%.

The F-statistic helps correct for this, but its distribution will still depend on dataset-specific details. However, you can count on male F statistics being clumped close to 1 (assuming you've excluded the pseudoautosomal regions) and female F statistics being scattered closer to 0, and fortunately that's usually enough to get the job done.

ADD REPLY
0
Entering edit mode

Nice. I checked it in my dataset, I have 12702 SNPs in chrX from 8648 european/africa/asian data. I found the het ratio for 99.9% male is <0.1% and only few of man have 0.8% het ratio (I doubt maybe be mistake of pedigree records) . for the female the average het ratio is > 8%. But apply the F-score, but many values between 0.2 and 0.8. I will change the threshold for better classification in my own study.

ADD REPLY
3
Entering edit mode
7.5 years ago
Ray ▴ 30

Hi, Could you please explain how you solved the issue? I have the same problem, large number of subjects are failing the sex-check in Plink for my GWAS dataset. I also get these two warnings: Warning: 280281 het. haploid genotypes present Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands treat these as missing.

ADD COMMENT

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6