I am after some advise as to what is the best method to correct for differences in allele codes at any given snp when merging across multiple files. I have data in plink format (bed/bim/fam) for several populations. When I attempt to merge the data using plink as follows:
plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata
I get reports of +/- strand issues and a file is generated detailing the problem SNPs.
On considering the .bim files at these problem snps for each population example allele codes are as follows:
Pop1: rs1000000 A G
Pop2: rs1000000 T C
Pop3: rs1000000 A G
This indicates to me that Pop2 has undergone strand flip.
Is there any software that can account for these differences when merging snp data - this must be a common problem? Or do each of these flips need to be identified computationally and corrected using plink to update the allele information as follows:
plink --bfile mydata --update=alleles mylist.txt --make-bed --out newfile
Thanks in advance.
I have followed the website and my "trial flip" results suggest that there are still strand issues. I don't want to remove the problem snps as this will reduce my snp count quite considerably.
The webpage you link to says: "PLINK cannot properly resolve genuine triallelic variants. We recommend exporting that subset of the data to VCF, using another tool/script to perform the merge in the way you want, and then importing the result."
Is there a way of merging VCF files that accounts for triallelic snps and strand flip? I've looked into vcftools but I'm not sure this is the right option.