.lgen to .ped conversion failing
0
1
Entering edit mode
10.1 years ago
dharl ▴ 70

So I have converted the reprt.txt and map.txt files into .lgen formats. But my conversion from .lgen to bed file keeps failing due to memory issue even though I try to run it chromosome wise.

Here are the first few lines of my files:

DGKH413ADHJW:Plink_working dlekshmi$ head HNPGenotypes.lgen
1 HNP_DNA_1 rs1000000 C C
2 HNP_DNA_1 rs1000002 A G

DGKH413ADHJW:Plink_working dlekshmi$ head HNPGenotypes.fam
1 HNP_DNA_1 0 0 0 0
2 HNP_DNA_1 0 0 0 0

DGKH413ADHJW:Plink_working dlekshmi$ head HNPGenotypes.map
12 rs1000000 0 126890980
3 rs1000002 0 183635768

I generated the files based on this code:

https://github.com/dhibar/ADNI_Genetics_Convert_to_PLINK according to the presentation: http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/nichols/presentations/ohbm2014/imggen/Hibar_ImgGen-CommonVar_OHBM2014.pdf

There are whole there are 70000SNPs for over 100 samples. So on the whole around 7 million. Any help will be appreciated.

PLINK SNP • 3.0k views
ADD COMMENT
1
Entering edit mode

Does this fail in PLINK 1.90? If so, can you describe the memory error in more detail?

ADD REPLY
0
Entering edit mode

This is the error I get. yes I run this on 1.9

Reading pedigree information from [ HNPGenotypes.fam ] 
plink(7973,0xa02711a8) malloc: *** mach_vm_map(size=1048576) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
*****************************************************
* FATAL ERROR    Exhausted system memory            *
*                                                   *
* You need a smaller dataset or a bigger computer...*
*                                                   *
* Forced exit now...                                *
*****************************************************
ADD REPLY
1
Entering edit mode

Double-check your version; that's a v1.07 error message.

ADD REPLY
0
Entering edit mode

You mean 70 million, not 7. I'd suggest breaking your input into 70 files, 1 million lines each, convert to bed and combine them. Would that work?

ADD REPLY
0
Entering edit mode

Hi,

Sorry it was a mistake, it 7 million. I edited the number now. Well I tried doing it per chromosome... and it gave me the same error. maybe I should split it into smaller files.. but I donot know how to combine them

ADD REPLY
0
Entering edit mode

Smaller files might help, yes, unless PLINK offers memory allocation/optimization options.

Combining BED files is fairly simple. You'd have to split your input files and use a name the split files with a numeric index in them (like part.1, part.2, ... part.99, ...)

Each part file would yield a BED file that you can name similarly (part.1.bed, part.2.bed, ... part.99.bed, ...)

You'd then run a loop in the shell, grep the lines that do not start with a '#' and append to the master BED file. Something like this:

for I in $(seq 1 99)
do
  grep "^[^#]" part.${i}.bed >>bigBedFile.bed
done
ADD REPLY
0
Entering edit mode

Unfortunately, this doesn't work since PLINK .bed is not the same as UCSC Genome Browser .bed.

ADD REPLY
0
Entering edit mode

Oh, this is the other bed, the binary format. Oops, I should be more careful. Thank you!

ADD REPLY
0
Entering edit mode

Hello all, I am new in plink analysis, Conversion to plink format keeps failing due to an error.

Error: failed to open HNPGenotypes.fam

Converting raw genotype output from BeadStudio CSV files to Plink format

Any help will be appreciated

ADD REPLY

Login before adding your answer.

Traffic: 2323 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6