Hi!
I ran GenomeScope to try to estimate the level of heterozygosity in my genome, however, the output plot looks quite strange and most alarming, is the incredibly large estimated genome size (I am expecting a genome of ~5MB and getting 120MB), so I am not sure if I can trust the reported heterozygosity value. Has anyone ever experienced this before and can offer any suggestions on what could be happening here? Some more information: I have 200bp paired end reads and pretty high coverage.
Link to plot:
All the code I used to get the plot:
jellyfish count -C -m 21 -s 5000000000 -t 8 R1.fastq -o reads.jf
jellyfish histo -t 8 reads.jf > reads.histo
Rscript genomescope.R reads.histo 21 200 results_out 700
More of the GenomeScope output:
len:120MB uniq:0.43% het:2.97% kcov:13.3 err:0.143% dup:0.39% k:21
Thank you
I have never used
GenomeScope
orjellyfish
directly, but you say your expected genome size is ~5M bp, but you apparently entered 5G bp (nine zeros instead of 6). You could also compare your results toKAT
(https://kat.readthedocs.io/en/latest/).