Large input when using jellyfish

1

Entering edit mode

8.1 years ago

DVA ▴ 630

Anyone here uses Jellyfish for whole genome sequencing data (directly re-formatted from fastq)? The input is ~100G and the command is like the following:

/home/jellyfish-2.2.6/bin/jellyfish count -m 14 -s 100M -o /hash/hash_sample_L_0_k_14.jf /sample/sample.fasta

System returns "Killed" after about 40min and I'm assuming it is due to a mem or swap exhaustion... I currently lowered the kmer length to 10, but would like to learn if there is some alternatives here. Thanks a lot.

Update: I tried 10 (-m 10), but it is also "Killed". Trying -m 5 now...

jellyfish next-gen whole genome seq • 3.0k views

ADD COMMENT • link 8.1 years ago by DVA ▴ 630

1

Entering edit mode

You could check free mem and swap with htop while running the program. There are only ~1M possible 10-mers so 100M initial hash is quite an overkill for that..

ADD REPLY • link 8.1 years ago by 5heikki 11k

0

Entering edit mode

Thanks so much for the reply. Could you please explain a little further? Is jellyfish taking all reads (I actually have 500M reads) into consideration at once? I thought the only memory consuming part is the 1M possible k-mers... Thanks a lot.

ADD REPLY • link 8.1 years ago by DVA ▴ 630

Login before adding your answer.