How to extract from BED files
2
0
Entering edit mode
8.4 years ago
forever ▴ 80

I have BED file which is very large. I need to extract the geneotype of only 10 subjects. I do not interested with all subjects, I am only interested in 10 subjects, so how can I extract those subjects from BED file?

snp SNP R genome • 7.6k views
ADD COMMENT
0
Entering edit mode

Thank you I use plink --hapmap1.ped --keep mylist.txt But it does not work.

error problem parsing the command line arguments I need only to keep the data of the below subjects and remove other subjects which are not included in my list. Below mylist.txt 136_S_4269 130_S_4352 129_S_4371 129_S_4369 031_S_4496 031_S_4474 031_S_4218 031_S_4032 031_S_4024 031_S_4021 019_S_4477 019_S_4367 019_S_4252 018_S_4400 018_S_4399 018_S_4349 018_S_4313 018_S_4257 012_S_4026 006_S_4449 006_S_4357 006_S_4192 006_S_4153 006_S_4150 002_S_4270 002_S_4225 002_S_4213

ADD REPLY
0
Entering edit mode

@fadlwork: You are not providing the right command line arguments to plink. I have edited my answer below to include the command to subset by individuals for plink v1. Are you using plink v1 or v2?

ADD REPLY
0
Entering edit mode
8.4 years ago

When you say a BED file, are you talking about the binary genotype .bed file defined by plink?

If so, you can use plink to include or exclude certain samples easily. See https://www.cog-genomics.org/plink2/filter if you're using plink v2...

If you're using plink v1, and you have files called hapmap1.ped and hapmap1.map, and say you want to create output called mysubset.ped and mysubset.map, then the command would be:

plink --file hapmap1 --keep mylist.txt --recode --out mysubset

There is another type of "BED file" (UCSC Genome Browser's BED format) which has nothing to do with plink. It doesn't sound like you're talking about this kind of file... but if you are, then I imagine you just want to extract a certain subset of columns (e.g. using awk), but you'd have to give a little more detail about the structure of the file to get a full answer.

ADD COMMENT
0
Entering edit mode

Can you please post an example of the data inputs and desired output?

ADD REPLY
0
Entering edit mode

Did you mean to comment this on the question?

ADD REPLY
0
Entering edit mode

Yes I did... thank you :)

ADD REPLY
0
Entering edit mode
8.4 years ago
bioguy24 ▴ 230

I am not familiar (have never used it) with the plink bed format, but reading the documentation for 1.9 maybe:

awk 'NR==FNR{A[$1];next}$1 in A' mylist.txt hapmap1.ped > result.txt

column 1 of hapmap1 is being for a match to each line in mylist.txt, and only the 10 matches are in the result. If column 1 is not the correct one to search change the $1 after next to whatever column.

ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6