Question

How to handle the whole genome sequencing data using by plink?

0

Entering edit mode

8.2 years ago

line1438 ▴ 40

I had 421 whole genome sequencing data and I want to analyze these data,

so I convert these data from the format of vcf to the format of ped,

but when the stage of tped file convert to ped file, the plink argued "Exhausted system memory",

I thought the reason is the number of markers were no less than 500 million in the sequencing data,

but the RAM (memory) in the server is 128 GB already,

so if I can not expand the memory in the server, how do I handle the whole genome sequencing data using by plink or other trustworthy software?

plink vcftools WGS sequencing • 6.3k views

ADD COMMENT • link updated 8.2 years ago by stu111538 ▴ 80 • written 8.2 years ago by line1438 ▴ 40

0

Entering edit mode

Try the --memory and/or --parallel flags

ADD REPLY • link 8.2 years ago by Medhat 9.8k

0

Entering edit mode

Sorry, I can't find the command you mentioned in Plink 1.07 or 1.9,

can you tell me the details about it, thanks a lot.

ADD REPLY • link 8.2 years ago by line1438 ▴ 40

0

Entering edit mode

https://www.cog-genomics.org/plink2/parallel

https://www.cog-genomics.org/plink2/other#memory

Good luck :)

ADD REPLY • link 8.2 years ago by Medhat 9.8k

0

Entering edit mode

I try the option --memory in Plink 1.9, but I had a question...

the total memory in my server was 128 GB, but actually only 112 GB memory,

how do I set the memory size (in MB) for the option --memory in Plink 1.9?

I try two situations such as --memory 120000 and --memory 100000, both can be executed normally

but I still want to know what size is the best to set to me.

ADD REPLY • link 8.2 years ago by line1438 ▴ 40

score 3 · Accepted Answer · 2016-08-31

3

Entering edit mode

8.2 years ago

stu111538 ▴ 80

Do you want to have all 421 datasets in one ped file? I handled large data with plink recently. I merged 60 vcf files to a multi-sampel vcf. There were 900 million markers and conversion to plink format using Plink 1.9 worked with 100GB MEM und 4 cpus. However, with 70GB it did not work. And you have much more samples. I think the problem is, that plink loads all markers of all samples to MEM. You could try to convert every sample individually after each other and then merge the ped files using plink's --merge-list afterwards.

ADD COMMENT • link 8.2 years ago by stu111538 ▴ 80

1

Entering edit mode

PLINK 1.9 does not keep all input data in memory simultaneously

ADD REPLY • link 8.2 years ago by Medhat 9.8k

0

Entering edit mode

Yes, I want to have all 421 vcf file in one ped file.

I have some questions :

How do you merge 60 vcf files to a multi-samples vcf? method and software?
What method do you convert the data to plink format?

Actually, I converted one vcf file to plink format, not all samples, but I am using Plink 1.07.

So, the plink argued "memory is not enough" is caused by Plink 1.07?

ADD REPLY • link 8.2 years ago by line1438 ▴ 40

0

Entering edit mode

You could try Plink 1.9. Command is probably the same: ./plink --vcf your_vcf --make-bed --out ... And try the --memory option as it was suggested.

You can merge vcf files using bcftools merge. Before, you have to bgzip and index (tabix) your vcf files. Consider, that your multi-sample vcf will be extremely large and maybe hard to handle.

ADD REPLY • link 8.2 years ago by stu111538 ▴ 80

0

Entering edit mode

I try the option --memory in Plink 1.9, but I had a question...

the total memory in my server was 128 GB, but actually only 112 GB memory,

how do I set the memory size (in MB) for the option --memory in Plink 1.9?

I try two situations such as --memory 120000 and --memory 100000, both can be executed normally

but I still want to know what size is the best to set to me.

ADD REPLY • link 8.2 years ago by line1438 ▴ 40