Hi all,
Maybe I should not to post this kind of question here, but I am just used to asking question here, so I hope you can give me the right answer.
I have generated a ~30G whole chromosomes LD file using PLINK and want to find all LD snps using a query snp list.
For this process, I used a perl script with the command below:
open(FILE,"$path/$filename")
while(<FILE>)
{
do LD mapping!
}
Since the raw LD file is so big, it seems that the perl will occupy most memory of my machine.
script:
foreach $tmp1(@contents)
{
if($tmp1=~/xxxxxx/)
{
print "processing FILE $tmp1......"."\n";
open(FILE2,"$tmp1") or die "can not open FILE2!";
while(<FILE2>)
{
@line = split(/\s+/,$_);
#print $line[2]." ".$line[5]."\n";
$hash1{$line[3]} = $line[3]."\n".$line[6]."\n";
$hash2{$line[6]} = $line[3]."\n".$line[6]."\n";
#$eachline =<FILE2>;
}
close FILE2;
}
So how can I optimize my script to run faster?
Thanks for all!
Actually that open/while structure doesn't read the whole file in memory, there is something else in your code that is filling it, can you post the complete script?
so the problem is because you are creating a gigantic hash, do you need all in memory? what are you doing after loading the data?
Yes, I must build this hash, because I want to find ALL LD snps in this computed LD data. IF I do not build the hash table, It will be disaster for a permutation process.