I am comparing the performance of different kmer-based genome size estimation tools. However, to standardise, I am not quite sure about the reference genome size that I can use for every organism. For instance, the expected size of Arabidopsis thaliana is ~135 Mbp, but one of the largest reported assembly sizes for A. thaliana is about 148 Mbp. Should the largest reported assembly size be taken as the standard for evaluating the performance of these different size estimators, or should the most expected genome size be used? Or is there a better number that I can probably use? I wanted to understand this better since some tools seem to underestimate the genome size if the largest reported assembly is taken as the comparison standard. It would be great if someone could help me with this. Thank you!
Hi Philipp! Thank you. This is helpful.