Large input when using jellyfish
0
1
Entering edit mode
8.1 years ago
DVA ▴ 630

Anyone here uses Jellyfish for whole genome sequencing data (directly re-formatted from fastq)? The input is ~100G and the command is like the following:

/home/jellyfish-2.2.6/bin/jellyfish count -m 14 -s 100M -o /hash/hash_sample_L_0_k_14.jf /sample/sample.fasta

System returns "Killed" after about 40min and I'm assuming it is due to a mem or swap exhaustion... I currently lowered the kmer length to 10, but would like to learn if there is some alternatives here. Thanks a lot.

Update: I tried 10 (-m 10), but it is also "Killed". Trying -m 5 now...

jellyfish next-gen whole genome seq • 3.0k views
ADD COMMENT
1
Entering edit mode

You could check free mem and swap with htop while running the program. There are only ~1M possible 10-mers so 100M initial hash is quite an overkill for that..

ADD REPLY
0
Entering edit mode

Thanks so much for the reply. Could you please explain a little further? Is jellyfish taking all reads (I actually have 500M reads) into consideration at once? I thought the only memory consuming part is the 1M possible k-mers... Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6