I ran jellyfish on my Illumina reads, first using the -C options, and then without, on the same data. Those are the commands I used:
With -C option:
time jellyfish count -m 21 -s 100M -t 15 -C <(zcat file_1.fastq.gz file_2.fastq.gz)
Without -C option:
time jellyfish count -m 21 -s 100M -t 15 <(zcat file_1.fastq.gz file_2.fastq.gz)
In the first case I obtained 1.684.382.436 distinct k-mers, while in the second case I obtained 2.205.041.740, so only 520.659.304.
How is it possible? I was expecting the number of distinct k-mers in the second case (no -C) to be around twice the number in the first case (coverage is, approximately, x53)
Thank you very much!
Well said!