Entering edit mode
6.5 years ago
Lina F
▴
200
Hi all,
I found a tutorial suggesting how to use Jellyfish to estimate genome size:
http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html
However, after running jellyfish count
and jellyfish histo
the output shows that every other kmer count is zero.
Below is my code, trying several values of k
.
I feel like I'm missing something simple -- why are the odd k-mer counts zero?
Thanks for any advice!
~Lina
for K in 21 23 25 27 29 31;
do
jellyfish count -t 20 -C -m $K -s 5G -o output_${K}.jf --min-quality=20 --quality-start=33 all.fastq
jellyfish histo -f output_${K}.jf > histogram_${K}.txt
jellyfish stats -v -o stats_${K}.txt output_${K}.jf
done
head histogram_31.txt
0 0
1 0
2 14028836
3 0
4 2053267
5 0
6 966831
7 0
8 554663
9 0
cat stats_31.txt
Unique: 0
Distinct: 37557758
Total: 2901177252
Max_count: 2419076
Edited to add the contents of the stats file.
You also have no k-mers with frequency of 1, which is extremely unlikely. Did you somehow doubled up your input fastq? Did you copy the original fastq at some point and concatenated the copy to the original?
I double checked and I did not double up my input fastq files. However, I am using both fwd and rev read files. In total I have 29.5 million read pairs. Should I downsample this?
EDITED to add: I just ran the code with only the FWD read files and now I get 1mers and odd kmers in general.
I realized my input data was wrong (my R1 and my R2 files were indeed the same, they were just given to me with different names)
Thanks for the helpful advice!