Exclude non shared SNPs from two datasets with PLINK
1
0
Entering edit mode
3.5 years ago

Hello,

I have two SNPs datasets that I need to merge. I have created a file called "shared.bim" which contains all the shared sites and a reference_allele.list to reorder the sites in both files before merging.

Since trying to merge without first removing the non shared sites would give an error ("Warning - impossible allele assignment") I need to remove the non shared sites from both datasets.

I know I need to create a command that contains --recode and --make bed and have a SNP-LIST-FILE to indicate what to remove; I just don't understand how to create the list of the SNP that I need to remove.

Is there a way to simply tell with a script: "remove those not contained in "shared.bim"?

Thank you

Human SNPs Genome Shared PLINK Sites • 1.2k views
ADD COMMENT
0
Entering edit mode
3.5 years ago

plink is very well documented : https://www.cog-genomics.org/plink/1.9/filter#snp

use --extract command to extract SNP ID of your choice in both datasets prior to merge.

Also check the merge section : https://www.cog-genomics.org/plink/1.9/data#merge

ADD COMMENT
0
Entering edit mode

Hello,

I don't have problems with the merging process, just on how to flag the non shared ones

I did create a text file called "shared_sites.txt", which contains the shared sites taken from shared.bim.

I wrote this script to prepare for merging (I will merge in VCFtools):

plink --bfile dataset1 --recode vcf --a2-allele reference_allele.list --keep-clusters shared_sites.txt --out dataset1.filtered

I have put --keep-clusters shared_sites.txt to try flagging the shared sites to keep, while --a2-allele reference_allele.list is meant to reorder the snps according to the list.

Is the keep cluster command going to work for this purpose?

ADD REPLY

Login before adding your answer.

Traffic: 1798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6