Question about the reliability of --check-sex in PLINK2
1
1
Entering edit mode
6.2 years ago
boymin2020 ▴ 80

Hi all,

I have tried to use plink2 to check the gender situation of my several datasets. It always shows more problem samples than expected. For some datasets, I also have RNAseq data. Then I checked again with some genes just on chromosome X or Y. It turned out a few problem samples. Therefore, I doubt the reliability of plink sex check. Or maybe, I have some wrong steps/parameters in my codes. The weirdest thing is plink2 can not scan variants on CHROM Y while a number of variants (7375) do exist. The below is an example log file. In addition, there are 45911 variants on CHROM X, but in the log file, it scans 40431. Any advice or comments will be appreciated.

PLINK v1.90b4 64-bit (20 Mar 2017) www.cog-genomics.org/plink/1.9/ (C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to WGS-GATK-bcftools-Plink-update3.qc.log. Options in effect: --bfile WGS-GATK-bcftools-Plink-update3 --check-sex 0.35 0.65 --noweb --out WGS-GATK-bcftools-Plink-update3.qc

Note: --noweb has no effect since no web check is implemented yet. 16384 MB RAM detected; reserving 8192 MB for main workspace. 2120644 variants loaded from .bim file. 166 people (93 males, 73 females) loaded from .fam. 158 phenotype values loaded from .fam. Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 166 founders and 0 nonfounders present. Calculating allele frequencies... done. Warning: 904336 het. haploid genotypes present (see WGS-GATK-bcftools-Plink-update3.qc.hh ); many commands treat these as missing. Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands treat these as missing. Total genotyping rate is 0.928209. 2120644 variants and 166 people pass filters and QC. Among remaining phenotypes, 142 are cases and 16 are controls. (8 phenotypes are missing.) --check-sex: 40431 Xchr and 0 Ychr variant(s) scanned, 86 problems detected. Report written to WGS-GATK-bcftools-Plink-update3.qc.sexcheck .

PLINK2 sex-check • 9.9k views
ADD COMMENT
0
Entering edit mode

I don't think you're using plink2 - log says plink v1.90b4.

Also, --check-sex is just one indicator that you should use in conjunction with other checks (such as --genome) and a PEDIGREE file to deduce actual problem samples. Plink does flag a number of samples, but aren't we better off with a few extra samples to scrutinize?

I'd recommend picking a set of high quality variants on the X and Y chromosomes to use sex-check on. Run sex check using X as well as Y chromosomes and combine the results for better insight. Samples that have an actual sex swap with raise flags with both sex checks, plus the anomalies in the F score and chrX homozygosity values will be strong enough to warrant deeper digging.

Tagging chrchang523 for more insights

ADD REPLY
3
Entering edit mode
6.2 years ago

(This response is also posted in plink2-users.)

  • --check-sex ignores chrY for backward compatibility with plink 1.07. See http://www.cog-genomics.org/plink/1.9/basic_stats#check_sex for documentation on the 'ycount' modifier for taking # of nonmissing chrY genotypes into account.
  • If you didn't use --split-x (or plink 2.0's --split-par) before running --check-sex, you should; legitimate male heterozygous genotypes in the pseudoautosomal regions confuse the --check-sex algorithm.
  • Appropriate F-statistic thresholds are pretty data-dependent. Don't worry too much about "X problems detected" during your first --check-sex run; instead, load the output file in R/Python/etc. and make a F-statistic against sex-label plot. If almost all the male F-statistics are in a clump on the right and the female F-statistics are spread out, you're fine. If they're randomly mixed with each other, there's almost certainly some sample-labeling mixup.
ADD COMMENT
0
Entering edit mode

I recall using the --extract option to use only a specific set of SNPs, and excluding the PAR while creating that set of SNPs. Does that work?

ADD REPLY
1
Entering edit mode

Yes, that works; just need to make sure PAR variants are not included.

ADD REPLY
0
Entering edit mode

Thanks for your reply. Another two questions.

  1. How the select the specific set of SNPs?
  2. What is the meaning of PAR?
ADD REPLY
0
Entering edit mode
  1. Use an allele frequency threshold (>0.01, say) and ensure they have rs numbers - this is just one way.
  2. PAR = pseudo-autosomal regions.
ADD REPLY
0
Entering edit mode
  1. Use --freq to get the MAF report, then extract variants (>0.01) with rs outside of PAR?
ADD REPLY
0
Entering edit mode

That's pretty much it, yes.

ADD REPLY
0
Entering edit mode

Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 2553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6