How to handle the whole genome sequencing data using by plink?
1
0
Entering edit mode
8.2 years ago
line1438 ▴ 40

I had 421 whole genome sequencing data and I want to analyze these data,

so I convert these data from the format of vcf to the format of ped,

but when the stage of tped file convert to ped file, the plink argued "Exhausted system memory",

I thought the reason is the number of markers were no less than 500 million in the sequencing data,

but the RAM (memory) in the server is 128 GB already,

so if I can not expand the memory in the server, how do I handle the whole genome sequencing data using by plink or other trustworthy software?

plink vcftools WGS sequencing • 6.3k views
ADD COMMENT
0
Entering edit mode

Try the --memory and/or --parallel flags

ADD REPLY
0
Entering edit mode

Sorry, I can't find the command you mentioned in Plink 1.07 or 1.9,

can you tell me the details about it, thanks a lot.

ADD REPLY
0
Entering edit mode

I try the option --memory in Plink 1.9, but I had a question...

the total memory in my server was 128 GB, but actually only 112 GB memory,

how do I set the memory size (in MB) for the option --memory in Plink 1.9?

I try two situations such as --memory 120000 and --memory 100000, both can be executed normally

but I still want to know what size is the best to set to me.

ADD REPLY
3
Entering edit mode
8.2 years ago
stu111538 ▴ 80

Do you want to have all 421 datasets in one ped file? I handled large data with plink recently. I merged 60 vcf files to a multi-sampel vcf. There were 900 million markers and conversion to plink format using Plink 1.9 worked with 100GB MEM und 4 cpus. However, with 70GB it did not work. And you have much more samples. I think the problem is, that plink loads all markers of all samples to MEM. You could try to convert every sample individually after each other and then merge the ped files using plink's --merge-list afterwards.

ADD COMMENT
1
Entering edit mode

PLINK 1.9 does not keep all input data in memory simultaneously

ADD REPLY
0
Entering edit mode

Yes, I want to have all 421 vcf file in one ped file.

I have some questions :

  1. How do you merge 60 vcf files to a multi-samples vcf? method and software?

  2. What method do you convert the data to plink format?

Actually, I converted one vcf file to plink format, not all samples, but I am using Plink 1.07.

  1. So, the plink argued "memory is not enough" is caused by Plink 1.07?
ADD REPLY
0
Entering edit mode

You could try Plink 1.9. Command is probably the same: ./plink --vcf your_vcf --make-bed --out ... And try the --memory option as it was suggested.

  1. You can merge vcf files using bcftools merge. Before, you have to bgzip and index (tabix) your vcf files. Consider, that your multi-sample vcf will be extremely large and maybe hard to handle.
ADD REPLY
0
Entering edit mode

I try the option --memory in Plink 1.9, but I had a question...

the total memory in my server was 128 GB, but actually only 112 GB memory,

how do I set the memory size (in MB) for the option --memory in Plink 1.9?

I try two situations such as --memory 120000 and --memory 100000, both can be executed normally

but I still want to know what size is the best to set to me.

ADD REPLY

Login before adding your answer.

Traffic: 2079 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6