Question

Accounting for problem SNPS when merging multiple plink files

5

Entering edit mode

11.3 years ago

lhvkl ▴ 50

I am after some advise as to what is the best method to correct for differences in allele codes at any given snp when merging across multiple files. I have data in plink format (bed/bim/fam) for several populations. When I attempt to merge the data using plink as follows:

plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

I get reports of +/- strand issues and a file is generated detailing the problem SNPs.

On considering the .bim files at these problem snps for each population example allele codes are as follows:

Pop1: rs1000000  A  G
Pop2: rs1000000  T  C
Pop3: rs1000000  A  G

This indicates to me that Pop2 has undergone strand flip.

Is there any software that can account for these differences when merging snp data - this must be a common problem? Or do each of these flips need to be identified computationally and corrected using plink to update the allele information as follows:

plink --bfile mydata --update=alleles mylist.txt --make-bed --out newfile

Thanks in advance.

plink snp allele-codes strand-flip • 24k views

ADD COMMENT • link updated 3.9 years ago by Ram 45k • written 11.3 years ago by lhvkl ▴ 50

0

Entering edit mode

I have followed the website and my "trial flip" results suggest that there are still strand issues. I don't want to remove the problem snps as this will reduce my snp count quite considerably.

The webpage you link to says: "PLINK cannot properly resolve genuine triallelic variants. We recommend exporting that subset of the data to VCF, using another tool/script to perform the merge in the way you want, and then importing the result."

Is there a way of merging VCF files that accounts for triallelic snps and strand flip? I've looked into vcftools but I'm not sure this is the right option.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by lhvkl ▴ 50

Ram · Answer 1 · 2014-05-21

8

Entering edit mode

11.3 years ago

Maxime Lamontagne ★ 2.4k

PLINK gives you a list of SNPs who need to be flipped (???.missnp). You need to flip these SNPs.

Step 1 - First merge: plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

Step 2 - Flip SNPs: plink --file fA --flip mynewdata.missnp --make-bed --out mynewdata2

Step 3 - New merge: plink --bfile mynewdata2 --merge-list allfiles.txt --make-bed --out mynewdata3

After the second merge, if you still have a bug about the strand, those SNPs are probably triallelic.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by Maxime Lamontagne ★ 2.4k

0

Entering edit mode

Thanks Maxime. Step 2 just flips those snps in file fA rather than across all files so the new merge in step 3 only corrects for strand flip in file fA when merging. How do you handle this problem across multiple files?

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by lhvkl ▴ 50

1

Entering edit mode

Across multiple files, you only add one file each time.

Merge File 1 + File 2 (Step 1-2-3) --> NewFile1

Merge NewFile1 + File 3 (Step 1-2-3) --> NewFile2

Merge NewFIle2 + File 4 (Step 1-2-3) --> NewFile3 ...

It will take some time, but it will work.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by Maxime Lamontagne ★ 2.4k

0

Entering edit mode

I was hoping there was a less cumbersome way around this problem but I'll try these repeated steps. Thank you.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by lhvkl ▴ 50

0

Entering edit mode

--merge-list allows you to merge more than two files at a time. However, it does not really work for flips--you don't know which source file(s) need to flip which SNPs. So in your case (where you've verified that there probably are strand errors) the workflow described by Maxime is correct.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by chrchang523 11k

score 0 · Answer 2 · 2014-05-20

0

Entering edit mode

11.3 years ago

chrchang523 11k

See the discussion at https://www.cog-genomics.org/plink2/data#merge3 .

ADD COMMENT • link 11.3 years ago by chrchang523 11k