I want to convert bfile to ped format, the command used is :
plink --bfile base --recode tab --out paddy
and the plink version is 1.9,
Then I got error: --recode does not yet support multipass recoding of very large files.
The full log is:
PLINK v1.90b6.25 64-bit (5 Mar 2022)
Options in effect:
--bfile base
--out paddy
--recode tab
Hostname: MacBook-Pro.local
Working directory: paddy
Start time: Tue Mar 15 17:52:51 2022
Random number seed: 1647337971
16384 MB RAM detected; reserving 8192 MB for main workspace.
18128777 variants loaded from .bim file.
3024 people (0 males, 0 females, 3024 ambiguous) loaded from .fam.
Ambiguous sex IDs written to paddy.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 3024 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.826143.
18128777 variants and 3024 people pass filters and QC.
Note: No phenotypes present.
Error: --recode does not yet support multipass recoding of very large files;
contact the PLINK developers if you need this.
For now, you can try using a machine with more memory, and/or split the file
into smaller pieces and recode them separately.
End time: Tue Mar 15 17:53:16 2022
How can I split the file into smaller pieces and recode them separately?
Yes, I added that functionality to plink2 three days ago. But I didn't bring it up in my previous answer, because the more important question is, why are you exporting a >100 GB .ped file? I have never heard of a program that (i) can do something useful with that much data in a reasonable amount of time, yet (ii) the programmer is unable to make the small extension needed to read a ~15x smaller .bed file, far more quickly, instead.
I am asked to calculate the percentage of (diff pair) / (total pair).
For example, there is a gene sequence
A C A A T T G G
then the first pair A and C are different, so the 'diff pair' number is 1, total pair is 4, percentage is 1 / 4 = 25%.
Sorry I am not familiar with bio terminology...
So my initial thought is to convert the binary file to readable format.
And is there a better way for that?
The standard text format for large genomic datasets is VCF, not .ped. VCF has much better software support: see bcftools in particular. (plink 1.9 and 2.0 are also capable of importing and exporting VCF more efficiently than they can import/export .ped.)
There are many ways to perform this calculation, but this sounds like a homework problem so you're probably best off doing the recommended reading and maybe getting help from a course teaching assistant or office hours.