I have imputed genotype data per chromosome of cases as bed files, then I have the same for controls (from the same imputation panel). I am looking to merge the 2 together.
At the moment I am checking allele flipping by opening the bim files in python per chromosome, seeing that when I flip the alleles of my controls I get all rows matching in my cases bim file. If I flip the opposite way I get zero matches. The flip direction that provides matches is consistent for all chromosomes, and therefore I'm using that flipped position to then perform my merge of the 2.
I'm just remaking the ID column in the bim files and using that to merge, seems to work without error, but this is new for me. Am I checking allele flipping comprehensively enough or is there a better way that is more common/best practice?
I've also tried, after I merge in plink, checking with --flip-scan, but I am getting an error "Error: --flip-scan requires at least one case and one control, and only considers founders."
My code is this:
# Merge the case and control files for chromosome 2
plink --bfile "${case_file}" --bmerge "${control_file}.bed" "${control_file}.bim" "${control_file}.fam" --make-bed --out "${output_file}"
# Perform flip-scan to identify alleles that need to be flipped
plink --bfile "${output_file}" --flip-scan --out "${output_dir}/flip_scan_chr${chr}"
And my log file is this:
PLINK v1.90b6.27 64-bit (10 Dec 2022) www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to /merged_chr2_cases_controls.log.
Options in effect:
--bfile /common_snps/cases/chr2_cases_data.common
--bmerge /common_snps/controls/chr2_controls_data.common.bed /common_snps/controls/chr2_controls_data.common.bim /common_snps/controls/chr2_controls_data.common.fam
--make-bed
--out /merged_chr2_cases_controls
257655 MB RAM detected; reserving 128827 MB for main workspace.
2284 people loaded from
/common_snps/cases/chr2_cases_data.common.fam.
4615 people to be merged from
/common_snps/controls/chr2_controls_data.common.fam.
Of these, 4615 are new, while 0 are present in the base dataset.
1334242 markers loaded from
/common_snps/cases/chr2_cases_data.common.bim.
1334242 markers to be merged from
/common_snps/controls/chr2_controls_data.common.bim.
Of these, 0 are new, while 1334242 are present in the base dataset.
Warning: Variants '2:180171:C:T' and '2:180171:C:A' have the same position.
Warning: Variants '2:329227:C:T' and '2:329227:C:A' have the same position.
Warning: Variants '2:467580:G:T' and '2:467580:G:A' have the same position.
1255 more same-position warnings: see log file.
Performing single-pass merge (6899 people, 1334242 variants).
Merged fileset written to
/merged_chr2_cases_controls-merge.bed +
/merged_chr2_cases_controls-merge.bim +
/merged_chr2_cases_controls-merge.fam .
1334242 variants loaded from .bim file.
6899 people (0 males, 0 females, 6899 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
/merged_chr2_cases_controls.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 6899 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.994228.
1334242 variants and 6899 people pass filters and QC.
Note: No phenotypes present.
--make-bed to /merged_chr2_cases_controls.bed
+ /merged_chr2_cases_controls.bim +
/merged_chr2_cases_controls.fam ... done.
PLINK v1.90b6.27 64-bit (10 Dec 2022) www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to /flip_scan_chr2.log.
Options in effect:
--bfile /merged_chr2_cases_controls
--flip-scan
--out /flip_scan_chr2
257655 MB RAM detected; reserving 128827 MB for main workspace.
1334242 variants loaded from .bim file.
6899 people (0 males, 0 females, 6899 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
/flip_scan_chr2.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 6899 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.994228.
1334242 variants and 6899 people pass filters and QC.
Note: No phenotypes present.
Error: --flip-scan requires at least one case and one control, and only
considers founders.