Entering edit mode
2.9 years ago
Jalil Sharif
▴
80
I have been trying to develop a GRS.
All chromosomal files from the UK biobank were joined to generate a single merged.bed file. The filter was as follows:
--maf 0.01 \
--hwe 1e-6 \
--geno 0.1 \
I ran:
PLINK v1.90p 64-bit (8 Nov 2021)
Options in effect:
--bed chr_merged.bed
--bim chr_merged.bim
--clump park_updated.score
--clump-field P
--clump-kb 250
--clump-p1 1
--clump-r2 0.1
--clump-snp-field SNP
--fam chr1.fam
--out chr.qc
--threads 64
1031886 MB RAM detected; reserving 515943 MB for main workspace.
4113097 variants loaded from .bim file.
487409 people (223038 males, 264368 females, 3 ambiguous) loaded from .fam.
Ambiguous sex IDs written to chr.qc.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 487409 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.99122.
4113097 variants and 487409 people pass filters and QC.
Note: No phenotypes present.
I got the following message:
Warning: 'rs356203' is missing from the main dataset, and is a top variant.
Warning: 'rs356219' is missing from the main dataset, and is a top variant.
Warning: 'rs356215' is missing from the main dataset, and is a top variant.
2357669 more top variant IDs missing; see log file.
I have written on the plink forum, and was informed that my SNP are not in sync, I am not understanding what I have done wrong here.
Many thanks, that's a bit strange though, because the original .bgen files which I filtered were imputed.
Even if you use imputed data, you will still get mismatch because you likely won't get full coverage of every single SNP. Though if you are using imputed data, then the number of missing ID is a bit high and I would definitely check your bim file and see the coverage.