Reference genome size for genome size estimation.
1
0
Entering edit mode
6 weeks ago

I am comparing the performance of different kmer-based genome size estimation tools. However, to standardise, I am not quite sure about the reference genome size that I can use for every organism. For instance, the expected size of Arabidopsis thaliana is ~135 Mbp, but one of the largest reported assembly sizes for A. thaliana is about 148 Mbp. Should the largest reported assembly size be taken as the standard for evaluating the performance of these different size estimators, or should the most expected genome size be used? Or is there a better number that I can probably use? I wanted to understand this better since some tools seem to underestimate the genome size if the largest reported assembly is taken as the comparison standard. It would be great if someone could help me with this. Thank you!

genome-size • 341 views
ADD COMMENT
1
Entering edit mode
6 weeks ago

Normally you use flow cytometry-based c-values as the gold standard to compare your bioinformatics-based estimates to.

Kew's c-values database https://cvalues.science.kew.org/search lists 0.16 pg for ecotype Columbia (Col), which is 0.16 * 978 Mbp = 156.48 Mbp. It's even in the paper's title: 'Comparisons with Caenorhabditis (~100 Mb) and Drosophila (~175 Mb) Using Flow Cytometry Show Genome Size in Arabidopsis to be ~157 Mb and thus ~25 % Larger than the Arabidopsis Genome Initiative Estimate of ~125 Mb'. That's good enough for any paper or report, but also a chance to quickly discuss genome size variations - Kew's database goes up to 0.44 pg so there's substantial genome size variation.

The original A. thaliana genome paper lists estimates from 80 Mbp to 150 Mb https://www.pnas.org/doi/pdf/10.1073/pnas.92.24.10831

ADD COMMENT
0
Entering edit mode

Hi Philipp! Thank you. This is helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6