Accounting for problem SNPS when merging multiple plink files
2
5
Entering edit mode
10.6 years ago
lhvkl ▴ 50

I am after some advise as to what is the best method to correct for differences in allele codes at any given snp when merging across multiple files. I have data in plink format (bed/bim/fam) for several populations. When I attempt to merge the data using plink as follows:

plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

I get reports of +/- strand issues and a file is generated detailing the problem SNPs.

On considering the .bim files at these problem snps for each population example allele codes are as follows:

Pop1: rs1000000  A  G
Pop2: rs1000000  T  C
Pop3: rs1000000  A  G

This indicates to me that Pop2 has undergone strand flip.

Is there any software that can account for these differences when merging snp data - this must be a common problem? Or do each of these flips need to be identified computationally and corrected using plink to update the allele information as follows:

plink --bfile mydata --update=alleles mylist.txt --make-bed --out newfile

Thanks in advance.

plink snp allele-codes strand-flip • 23k views
ADD COMMENT
0
Entering edit mode

I have followed the website and my "trial flip" results suggest that there are still strand issues. I don't want to remove the problem snps as this will reduce my snp count quite considerably.

The webpage you link to says: "PLINK cannot properly resolve genuine triallelic variants. We recommend exporting that subset of the data to VCF, using another tool/script to perform the merge in the way you want, and then importing the result."

Is there a way of merging VCF files that accounts for triallelic snps and strand flip? I've looked into vcftools but I'm not sure this is the right option.

ADD REPLY
8
Entering edit mode
10.6 years ago

PLINK gives you a list of SNPs who need to be flipped (???.missnp). You need to flip these SNPs.

Step 1 - First merge: plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

Step 2 - Flip SNPs: plink --file fA --flip mynewdata.missnp --make-bed --out mynewdata2

Step 3 - New merge: plink --bfile mynewdata2 --merge-list allfiles.txt --make-bed --out mynewdata3

After the second merge, if you still have a bug about the strand, those SNPs are probably triallelic.

ADD COMMENT
0
Entering edit mode

Thanks Maxime. Step 2 just flips those snps in file fA rather than across all files so the new merge in step 3 only corrects for strand flip in file fA when merging. How do you handle this problem across multiple files?

ADD REPLY
1
Entering edit mode

Across multiple files, you only add one file each time.

Merge File 1 + File 2 (Step 1-2-3) --> NewFile1

Merge NewFile1 + File 3 (Step 1-2-3) --> NewFile2

Merge NewFIle2 + File 4 (Step 1-2-3) --> NewFile3 ...

It will take some time, but it will work.

ADD REPLY
0
Entering edit mode

I was hoping there was a less cumbersome way around this problem but I'll try these repeated steps. Thank you.

ADD REPLY
0
Entering edit mode

--merge-list allows you to merge more than two files at a time. However, it does not really work for flips--you don't know which source file(s) need to flip which SNPs. So in your case (where you've verified that there probably are strand errors) the workflow described by Maxime is correct.

ADD REPLY
0
Entering edit mode
10.6 years ago

See the discussion at https://www.cog-genomics.org/plink2/data#merge3 .

ADD COMMENT

Login before adding your answer.

Traffic: 1775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6