Entering edit mode
2.8 years ago
biobiu
▴
150
Currently methods like kraken/bracken calculate the fraction of reads assigned to each genome/taxa as its abundance regardless of the genome size. For example if my ecosystem contains two species with the same number of copies of each but one species has a genome size of x and the other species have genome size of 9x, the abundance would be 0.1 and 0.9 respectively, but if we scale for genome size/coverage it would be 0.5 and 0.5. I am wondering-
- Why do current methods not take genome size into account when calculating the abundance?
- I can of course scale abundance by genome size or calculate size-corrected metric (similar to RPKM etc.) but I guess these metrics won’t fit downstream relative abundance algorithms.
Any suggestion how to deal with these while using ecosystem with species with order of magnitude differences in genomes size?