Estimating genome size from k-mer histograms...
1
0
Entering edit mode
10.4 years ago
ab.tsubaki ▴ 50

Hi all

Anyone with experience using Jellyfish derived histograms?

I've done all the necessary bits and now I get to drawing up my histogram and it looks nothing like its supposed to! There's no peak or humps - it just starts at the top and slopes downward to flatten out at the bottom!

The scripts for running Jellyfish were as follows:

jellyfish count -t 8 -C -m 19 -s 5G -o filename.jf read.fastq
jellyfish dump filename.jf > filename.fa
jellyfish histo -o filename.histo filename.jf

My Kmer value of 19 comes from values obtained by running KmerGenie.

I dumped the histo file into Excel to take a look at the histogram.

Can anyone spot a problem, or has encountered this before? Am I using the wrong Kmer size? Or is there an underlying problem with my sequencing data?

Thanks in advance

Anandi

kmer genome-size next-gen • 4.6k views
ADD COMMENT
1
Entering edit mode

I think you meant to link to an image.

ADD REPLY
1
Entering edit mode
10.4 years ago

See this nice writeup that covers genome size estimation among other things: https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish

ADD COMMENT
0
Entering edit mode

Thanks Matt. That IS a very useful site. I've taken a look.

I tried attaching an image but couldn't figure out how. The main problem is that my histogram's shape does not lend itself to genome size estimation at all. You can describe the graph as "monotonically descending, exponential decay kmer histogram"!

I'm trying to generate histograms from some different kmer sizes now...

ADD REPLY
0
Entering edit mode

Histograms might not be a great idea if you are binning the data at all. Try plotting a density or just plotting lines between points. You might be missing your peak if you're binning say with a width of 5 or so.

ADD REPLY

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6