I have a phased VCF file and I need to change the alleles of the SNPs so that the alleles coded as 0 represent the ancestral state while the alleles coded as 1 represent the derived state.
I got a list of ancestral and derive states for my SNPs in my VCF file and I want to change them using PLINK.
If I have a list of ancestral alleles in a file ancestral_snps.txt
, if I use the following command will I get the wanted output?
plink --vcf myvcf.vcf --a1-allele ancestral_snps.txt --recode vcf --out myvcfancestral.vcf
Or is it --allele-a2
?
I assume that this change won't affect the phasing of my data, right? As I basically only recoded the alleles.
EDIT:
Following Christopher Chang's comment:
If my ancestral_snps.txt
file looks like this, specifying which allele is ancestral:
C rs1001
G rs1002
A rs1003
If I use this command in PLINK2
, will the ancestral alleles be coded as 0 in my phased VCF?
./plink2 --vcf myvcf.vcf --ref-allele force 1 2 ancestral_snps.txt --recode vcf --out myvcfancestral.vcf
The command ... --ref-allele force ancestral_snps.txt 1 2 ... do not change heterozygous, just homozygous. When I am changing all the reference alleles in the VCF it is perfect, because the phased haplotypes will be kept. However, if I intend to change the reference allele for a subset of markers in the VCF, this behaviour will destroy the phased haplotypes. To represent what I mean, I give you a short example where only the two first SNPs were changed:
Am I missing any command to force changes in heterozygous ?
Hi,
Can you post the full .log file from your run?
Sure.
PLINK v2.00a1LM AVX2 Intel (11 Feb 2018) Options in effect: --cow --export vcf --maf 0.01 --out teste --ref-allele force reference.teste2 1 2 --vcf CHR9.vcf.gz
Hostname: c070 Working directory: home/VCFs Start time: Thu Oct 11 15:58:01 2018
Random number seed: 1539233881 128719 MB RAM detected; reserving 64359 MB for main workspace. Using up to 20 threads (change this with --threads). --vcf: 1589271 variants scanned. --vcf: teste-temporary.pgen + teste-temporary.pvar + teste-temporary.psam written. 50 samples (0 females, 0 males, 50 ambiguous; 50 founders) loaded from teste-temporary.psam. 1589271 variants loaded from teste-temporary.pvar. Note: No phenotype data present. Calculating allele frequencies... done. 149348 variants removed due to minor allele threshold(s) (--maf/--max-maf/--mac/--max-mac). 1439923 variants remaining after main filters. --ref-allele: 14 sets of allele codes rotated. --export vcf to teste.vcf ... done.
End time: Thu Oct 11 15:58:12 2018
Can you retry this with a more recent build? This appears to be working properly on my end, and I recall fixing this several months ago.
Thanks.
It worked perfectly when I used PLINK v2.00a2LM AVX2.