Quick Way To Combine Two Datasets Using Only Common Markers
6
10
Entering edit mode
12.0 years ago
pufferfish ▴ 120

Is there a quick way to combine two datasets so that only the common markers are kept? Currently, if I have two datasets, I have to first get the intersection of the two BIM/MAP files, then extract those markers for each dataset, then merge the two.

The merge-mode doesn't seem to have the option I'm looking for either

plink • 24k views
ADD COMMENT
2
Entering edit mode

A small script in R would be really helpful in such case, check for %in% in R manual.

ADD REPLY
0
Entering edit mode

Hi. I have the same issue. There is any plink function that can help me to perform the intersection of two datasets? I mean, all but R function? Thanks

ADD REPLY
6
Entering edit mode
11.1 years ago
ff.cc.cc ★ 1.3k

Hi I suggest to combine R & Plink this way:

> R code:
> map2 = read.delim("file2.map", header=F, quote="")
> map1 = read.delim("file1.map", header=F, quote="")
> common.snps = which(map2$V2 %in% map1$V2)
> write.table(map2$V2[common.snps], file="list.snps", sep="\t", col.names=F, row.names=F, quote=F )

and finally

> the Plink commands:
> plink --bfile <file1> --extract list.snps --make-bed --out data1
> plink --bfile <file2> --extract list.snps --make-bed --out data2
> plink --bfile data1 --bmerge data2.bed data2.bim data2.fam --make-bed --out merge
ADD COMMENT
3
Entering edit mode

plink --bfile file1 --bmerge file2.bed file2.bim file2.fam --extract list.snps --make-bed --out merge

ADD REPLY
1
Entering edit mode

Or grep one-liner: grep -f file1.map file2.map | cut -f2 > common_snps.txt

ADD REPLY
0
Entering edit mode

If you are working with .map files (PED) how would the syntax change?

ADD REPLY
0
Entering edit mode

Use --file option instead of --bfile.

ADD REPLY
3
Entering edit mode
10.4 years ago

I would do this in R (as you can do this for an infinite number of sets by doing an intersection of the intersection...)

common.snps <- intersect(map1$V2, map2$V2)
write.table(common.snps, file = "common_snps.txt", sep ="\t", col.names = FALSE, row.names = FALSE, quote = FALSE )
ADD COMMENT
2
Entering edit mode
12.0 years ago
Joey ▴ 430

Why don't you figure out the common snps between the two datasets using a shell command (awk) or a R one-liner. You can then reduce each of those datasets to the same set of SNPS using the "--extract" option and then merge the datasets. Also, you should check if the two datasets have the same build.

ADD COMMENT
1
Entering edit mode
10.4 years ago

--write-snplist + --extract/--make-bed lets you do this purely with plink, though the other solutions also work.

ADD COMMENT
0
Entering edit mode

Would you illustrate with a command how --write-snplist + --extract / --make-bed would do this in PLINK2. I am merging two data sets in PLINK2 and I want to keep the common SNPs.

ADD REPLY
0
Entering edit mode
7.4 years ago
ShirleyDai ▴ 50

Why not use plink --merge-mode 1, which would output Consensus calls.

ADD COMMENT
1
Entering edit mode

Because the resulting merged dataset will have the union of the individual datasets' markers, and the original poster wanted the intersection.

ADD REPLY
0
Entering edit mode
3.2 years ago
wenbinm ▴ 40

The grep command doesn't work for me: grep -f file1.map file2.map | cut -f2 > common_snps.txt I use this:

sort file1.map file2.map|uniq -d > common_lines.txt
awk '{print $2}' common_lines.txt > common_snps.txt

Then use plink --extract and merge

ADD COMMENT

Login before adding your answer.

Traffic: 2374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6