Entering edit mode
10.9 years ago
Sabiha
•
0
This is the formula to estimate the genome size.
N = (M*L)/(L-K+1)
and
Genome_size = T/N,
where
N: Depth, M: Kmer peak, K: Kmer-size, L: avg readlength, T: Total bases.
What is the Total bases?
Is it the total bases of reads been taken by any assembler(abyss,soap)
How will I know the total bases
Hi Sabiha,
Have you ever come across a situation that different k value will lead to greatly different genome size result? For example, with read length of 300bp, I test on my data with k=33 and have an only peak at 26, then k=121 have an only peak at 7, the calculated N (depth) based on your formula is 29.1 and 11.7, correspondingly. Given that T (total bases) is the same (of course because same data set), therefore the genome size would be greatly different! How would you determine which one is the good estimation of your genome size?
Thanks.