Question

Jellyfish: every other kmer count is zero

0

Entering edit mode

7.2 years ago

Lina F ▴ 200

Hi all,

I found a tutorial suggesting how to use Jellyfish to estimate genome size:

http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html

However, after running jellyfish count and jellyfish histo the output shows that every other kmer count is zero.

Below is my code, trying several values of k.

I feel like I'm missing something simple -- why are the odd k-mer counts zero?

Thanks for any advice!

~Lina

for K in 21 23 25 27 29 31;
do
  jellyfish count -t 20 -C -m $K -s 5G -o output_${K}.jf --min-quality=20 --quality-start=33 all.fastq
  jellyfish histo -f output_${K}.jf > histogram_${K}.txt
  jellyfish stats -v -o stats_${K}.txt output_${K}.jf
done

head histogram_31.txt
0 0
1 0
2 14028836
3 0
4 2053267
5 0
6 966831
7 0
8 554663
9 0

cat stats_31.txt
Unique:    0
Distinct:  37557758
Total:     2901177252
Max_count: 2419076

Edited to add the contents of the stats file.

kmer counting jellyfish genome size estimation • 3.9k views

ADD COMMENT • link 7.2 years ago by Lina F ▴ 200

3

Entering edit mode

You also have no k-mers with frequency of 1, which is extremely unlikely. Did you somehow doubled up your input fastq? Did you copy the original fastq at some point and concatenated the copy to the original?

ADD REPLY • link 7.2 years ago by Damian Kao 16k

0

Entering edit mode

I double checked and I did not double up my input fastq files. However, I am using both fwd and rev read files. In total I have 29.5 million read pairs. Should I downsample this?

EDITED to add: I just ran the code with only the FWD read files and now I get 1mers and odd kmers in general.

I realized my input data was wrong (my R1 and my R2 files were indeed the same, they were just given to me with different names)

Thanks for the helpful advice!

ADD REPLY • link 7.2 years ago by Lina F ▴ 200