Hi all,
I have tried to use plink2 to check the gender situation of my several datasets. It always shows more problem samples than expected. For some datasets, I also have RNAseq data. Then I checked again with some genes just on chromosome X or Y. It turned out a few problem samples. Therefore, I doubt the reliability of plink sex check. Or maybe, I have some wrong steps/parameters in my codes. The weirdest thing is plink2 can not scan variants on CHROM Y while a number of variants (7375) do exist. The below is an example log file. In addition, there are 45911 variants on CHROM X, but in the log file, it scans 40431. Any advice or comments will be appreciated.
PLINK v1.90b4 64-bit (20 Mar 2017) www.cog-genomics.org/plink/1.9/ (C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to WGS-GATK-bcftools-Plink-update3.qc.log. Options in effect: --bfile WGS-GATK-bcftools-Plink-update3 --check-sex 0.35 0.65 --noweb --out WGS-GATK-bcftools-Plink-update3.qc
Note: --noweb has no effect since no web check is implemented yet. 16384 MB RAM detected; reserving 8192 MB for main workspace. 2120644 variants loaded from .bim file. 166 people (93 males, 73 females) loaded from .fam. 158 phenotype values loaded from .fam. Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 166 founders and 0 nonfounders present. Calculating allele frequencies... done. Warning: 904336 het. haploid genotypes present (see WGS-GATK-bcftools-Plink-update3.qc.hh ); many commands treat these as missing. Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands treat these as missing. Total genotyping rate is 0.928209. 2120644 variants and 166 people pass filters and QC. Among remaining phenotypes, 142 are cases and 16 are controls. (8 phenotypes are missing.) --check-sex: 40431 Xchr and 0 Ychr variant(s) scanned, 86 problems detected. Report written to WGS-GATK-bcftools-Plink-update3.qc.sexcheck .
I don't think you're using plink2 - log says
plink v1.90b4
.Also,
--check-sex
is just one indicator that you should use in conjunction with other checks (such as--genome
) and a PEDIGREE file to deduce actual problem samples. Plink does flag a number of samples, but aren't we better off with a few extra samples to scrutinize?I'd recommend picking a set of high quality variants on the X and Y chromosomes to use sex-check on. Run sex check using X as well as Y chromosomes and combine the results for better insight. Samples that have an actual sex swap with raise flags with both sex checks, plus the anomalies in the F score and chrX homozygosity values will be strong enough to warrant deeper digging.
Tagging chrchang523 for more insights