Hi, I have a de novo whole-genome assembly of a plant genome with a BUSCO score of:
C:97.1%[S:86.6%,D:10.5%],F:1.1%,M:1.8%,n:1614
Where:
1568 Complete BUSCOs (C) 1398 Complete and single-copy BUSCOs (S) 170 Complete and duplicated BUSCOs (D) 18 Fragmented BUSCOs (F) 28 Missing BUSCOs (M) 1614 Total BUSCO groups searched
Is 10.5% of complete and duplicated BUSCOs (D) too high or something that I should be worried about? If so, what are the reasons for getting high D values and how can I reduce/fix it? Thanks.
Thanks! It's good to hear this. Are not BUSCOs expected to be single copy?
well yes and no :)
have a look at the details how those BUSCO groups are build. If I remember correctly they need to be single copy in at least x% of the species under investigation. If you would be very stringent (== single copy in all species) you'll end up with only a few dozens to hundreds (we once did this exercise) . This is mainly due to the nature of many of those plant species, (ancient) polyploids. If you take the poplar case for instance, nearly all genes are still present in duplicate in that one and as such you would not keep many true single copy ones for the BUSCO.
Very interesting information. Thanks for sharing your knowledge. Now the answer to this question is clear.