Hi.
I'm going to check sex consistency using 1000 genomes phase 3 TSI samples. As a result, inconsistency in sex was detected in 3 samples. The IDs were NA20506, NA20530, and NA20533. So, I would like to know that this inconsistency was often know? Should I removed the samples in chrX SNP analysis?
To check the sex, I did following process.
I got a vcf file of chrX from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The file name was ALL.chrX.phase3_shapeit2_mvncall_integrated_v1b.20130502.genotypes.vcf.gz.
First, the file was converted to bed file format using PLINK 1.9. Second, SNPs in the file were splited into chrX SNPs and chrXY SNPs using plink --split-x. Third, EUR samples were extracted from the file using plink --keep. Next, SNPs with MAF<0.01, HWE P < 1e-6, or CR<0.98 were excluded. Finally, I checked sex based on the file using PLINK --check-sex.
Thank you for your comment.
I got a result of --check-sex as below
NA20506 NA20506 2 1 PROBLEM 0.894
NA20530 NA20530 2 1 PROBLEM 0.863
NA20533 NA20533 2 0 PROBLEM 0.4936
I think that this result indicates clear sex inconsistency regardless of threshold value. Could you give me any adivice?
See https://www.cog-genomics.org/plink/1.9/basic_stats#check_sex . The note on LD pruning is especially likely to be relevant here.
As long as you end up with a gap between the highest female and the lowest male, you probably have NO sex errors.
Thank you for your comment.
Of course, I already did LD pruning.
There seems to be a language barrier; you clearly do not fully understand my 3-sentence answers or the official documentation, otherwise you would have at least said something about the lowest male F-statistic in your dataset, or the lack of LD pruning in your actual list of steps.
You should try to find a more experienced analyst who understands your first language to talk to.
Thank you for your advice. I'll look for an experts, who can understand my language.
In my result of --check-sex, the lowest F in male samples is 1.00. the biggest F in female samples are 0.08218, 0.4936, 0.863, 0.894.
I appreciate it if you notice anything about it and would advice me.