Hi Biostars Community,
In my case, I had used GATK HaplotypeCaller to call variants. I had used SNP_Hard_Filtered_VCF file obtained using GATK. In my previous queries, I had issues regarding multi-allelic during plink merging. To solve that I had followed given below steps
I had followed these steps (1 and 2) for 144 Samples. I mean, I had run 1 and 2 commands on individual sample vcf files.
1. Convert VCF format to Plink format
bcftools norm -Ou -m -any HF_PASS_SNPs.vcf.gz | \
bcftools norm -Ou -f Bos_taurus_Ensembl_UMD3.1/genome.fa | \
bcftools annotate -Ob -x ID -I +'%CHROM:%POS:%REF:%ALT' | \
/usr/bin/plink1.9 --bcf /dev/stdin --keep-allele-order -cow --allow-no-sex --nonfounders --make-bed --out HF_PASS_SNPs_plink
This step (above command) has been suggested in this link to convert VCF to plink format
http://apol1.blogspot.com/2014/11/best-practice-for-converting-vcf-files.html
2. Then I had performed QC steps
/usr/bin/plink1.9 \
--bfile HF_PASS_SNPs_plink \
--cow \
--allow-no-sex \
--nonfounders \
--keep-allele-order \
--mind 0.1 \
--geno 0.1 \
--maf 0.05 \
--make-bed \
--out HF_PASS_SNPs_plink_QC
3. Then merge 144 Samples
/usr/bin/plink1.9 --cow --make-bed --merge-list myFile.txt --out mymerged_144
PLINK v1.90b6.22 64-bit (3 Nov 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to mymerged_144.log.
Options in effect:
--cow
--make-bed
--merge-list myFile.txt
--out mymerged_144
64245 MB RAM detected; reserving 32122 MB for main workspace.
Warning: Variants '1:21444:A:G' and '1:21444:A:*' have the same position.
Warning: Variants '1:21446:C:G' and '1:21446:C:*' have the same position.
Warning: Variants '1:21448:T:C' and '1:21448:T:*' have the same position.
7955 more same-position warnings: see log file.
Performing single-pass merge (138 cattle, 342592 variants).
Merged fileset written to mymerged_144-merge.bed + mymerged_144-merge.bim +
mymerged_144-merge.fam .
342592 variants loaded from .bim file.
138 cattle (0 males, 0 females, 138 ambiguous) loaded from .fam.
Ambiguous sex IDs written to mymerged_144.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 138 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.0729077.
342592 variants and 138 cattle pass filters and QC.
Note: No phenotypes present.
--make-bed to mymerged_144.bed + mymerged_144.bim + mymerged_144.fam ... done.
After using 3 Step, I am able to get merge files (bim, bed, fam). But I am not sure, If it is correct or not?
Before merging the Plink files, Total genotyping rate for each sample was 0.97. Here, after merging, Total genotyping rate is 0.0729077. Could you please explain, what might be the reason? Should I use the output for further steps?
Thanks a lot in advance