Retrieve a subset of SNPs in Plink
1
1
Entering edit mode
9.7 years ago
Sally ▴ 10

I have genotypes data for 2 population, both in binary format (.bed, .bim, .fam). 1st population, consist of 1 parent and 107 progenies. 2nd population only consist of 50 progenies only.

Since the 2nd population didn't have genotype data for parent, I would like to extract parent's SNPs data from the 1st population since there are closely related, and then merge it into 2nd population.

Plink provide function --exclude/--keep --merge/--bmerge. To retrieve parent data, I used: plink --bfile file --keep parent.txt --make-bed --out parent where parent.txt consists of family ID and individual ID.

To merge parent data into 2nd population, I used: plink --bfile file2 --bmerge parent.bed parent.bim parent.fam --make-bed --out merge

However, I noticed, after the extracting part, the number of data in .bim file still same as before. Am I using the correct commands?

Original file for population 1

wc file.*
3277 6844 4613223 file.bed
170860 1025160 5146571 file.bim
108 648 2194 file.fam

Parent file after extracting

wc parent.* 0 1 170863 parent.bed 170860 1025160 5146571 parent.bim 1 6 19 parent.fam

Please help me. Thank you.

plink • 7.3k views
ADD COMMENT
1
Entering edit mode
9.7 years ago
alesssia ▴ 580

Not sure of having understood your issue, but the .bim file describes the extended variant information, one variant per line. You have not performed any filtering on this dimension, hence the number of lines should not change. What should change is the .fam file, that indeed contains only one person (1 line in parent.fam, 107 lines in file,fam).

ADD COMMENT
0
Entering edit mode

Hi Alessia,

Thank you for the comment. Maybe I should restructure my problem statement. What I'm trying to do is I want to extract all variants for parent only (and I only know its family ID and individual ID) from population 1 and later merge this parent data into population 2.

I need help in order to solved this problem. Kindly advise me what to do. Thanks!

ADD REPLY
0
Entering edit mode

Then do:

plink --bfile file --keep parent.txt --make-bed --out parent
plink -- bfile parent --write-snplist --out parent_snps
plink --bfile file2 --keep parent_snps --make-bed --out file2_only_parental_snps
ADD REPLY
0
Entering edit mode

Hi Floris,

Thank you for the suggestion. I understand the first and second lines command..but a little bit confused for the third line. From second line, I should get the list of SNPs for the parent. Then for the third line, how to keep parent_snps in file2 (population 2)? In my understanding, --keep will retrieve data only for the specified id listed in parent_snps file.

Thanks!

ADD REPLY
0
Entering edit mode

Oh sorry need to be extract...

plink --bfile file2 --extract parent_snps.snplist --make-bed --out file2_only_parental_snps
ADD REPLY
0
Entering edit mode

Or maybe you can try this workflow:

plink --bfile file2 --bmerge file.bed file.bim file.fam --make-bed --out merge

Then make a id file include which contains the 50 progenies and the parent only (let's call it include)

plink --bfile merge --keep include --make-bed --out final
ADD REPLY
0
Entering edit mode

Hi Sam,

Thank you for the suggestion. Will give it a try and update it later. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6