Entering edit mode
8.1 years ago
DVA
▴
630
Anyone here uses Jellyfish for whole genome sequencing data (directly re-formatted from fastq)? The input is ~100G and the command is like the following:
/home/jellyfish-2.2.6/bin/jellyfish count -m 14 -s 100M -o /hash/hash_sample_L_0_k_14.jf /sample/sample.fasta
System returns "Killed" after about 40min and I'm assuming it is due to a mem or swap exhaustion... I currently lowered the kmer length to 10, but would like to learn if there is some alternatives here. Thanks a lot.
Update: I tried 10 (-m 10), but it is also "Killed". Trying -m 5 now...
You could check free mem and swap with htop while running the program. There are only ~1M possible 10-mers so 100M initial hash is quite an overkill for that..
Thanks so much for the reply. Could you please explain a little further? Is jellyfish taking all reads (I actually have 500M reads) into consideration at once? I thought the only memory consuming part is the 1M possible k-mers... Thanks a lot.