So I have converted the reprt.txt and map.txt files into .lgen formats. But my conversion from .lgen to bed file keeps failing due to memory issue even though I try to run it chromosome wise.
Here are the first few lines of my files:
DGKH413ADHJW:Plink_working dlekshmi$ head HNPGenotypes.lgen
1 HNP_DNA_1 rs1000000 C C
2 HNP_DNA_1 rs1000002 A G
DGKH413ADHJW:Plink_working dlekshmi$ head HNPGenotypes.fam
1 HNP_DNA_1 0 0 0 0
2 HNP_DNA_1 0 0 0 0
DGKH413ADHJW:Plink_working dlekshmi$ head HNPGenotypes.map
12 rs1000000 0 126890980
3 rs1000002 0 183635768
I generated the files based on this code:
https://github.com/dhibar/ADNI_Genetics_Convert_to_PLINK according to the presentation: http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/nichols/presentations/ohbm2014/imggen/Hibar_ImgGen-CommonVar_OHBM2014.pdf
There are whole there are 70000SNPs for over 100 samples. So on the whole around 7 million. Any help will be appreciated.
Does this fail in PLINK 1.90? If so, can you describe the memory error in more detail?
This is the error I get. yes I run this on 1.9
Double-check your version; that's a v1.07 error message.
You mean 70 million, not 7. I'd suggest breaking your input into 70 files, 1 million lines each, convert to bed and combine them. Would that work?
Hi,
Sorry it was a mistake, it 7 million. I edited the number now. Well I tried doing it per chromosome... and it gave me the same error. maybe I should split it into smaller files.. but I donot know how to combine them
Smaller files might help, yes, unless PLINK offers memory allocation/optimization options.
Combining BED files is fairly simple. You'd have to split your input files and use a name the split files with a numeric index in them (like part.1, part.2, ... part.99, ...)
Each part file would yield a BED file that you can name similarly (part.1.bed, part.2.bed, ... part.99.bed, ...)
You'd then run a loop in the shell, grep the lines that do not start with a '#' and append to the master BED file. Something like this:
Unfortunately, this doesn't work since PLINK .bed is not the same as UCSC Genome Browser .bed.
Oh, this is the other bed, the binary format. Oops, I should be more careful. Thank you!
Hello all, I am new in plink analysis, Conversion to plink format keeps failing due to an error.
Converting raw genotype output from BeadStudio CSV files to Plink format
Any help will be appreciated