Hello there,
I am new to BUSCO, and I am having a hard time interpreting some of the results that I am having.
The command that I used for my search is below:
busco -i assemblies -l endopterygota_odb10 -o batchtest -m genome -c 32 -f --metaeuk
I am using assembly files based on WGS and transcriptomes for a group of beetles. I noticed that all my transcriptome assembly files are quite small (100 mb each) and all my WGS-based assembly files are quite large (>1GB each).
All BUSCOs from my transcriptome samples are quite good, see below a sample:
***** Results: *****
C:93.2%[S:92.1%,D:1.1%],F:2.6%,M:4.2%,n:2124
1981 Complete BUSCOs (C)
1957 Complete and single-copy BUSCOs (S)
24 Complete and duplicated BUSCOs (D)
55 Fragmented BUSCOs (F)
88 Missing BUSCOs (M)
2124 Total BUSCO groups searched
However, ALL BUSCOs from my WGS samples are quite low (see below). I noticed that the number of fragmented BUSCOs on my WGS samples are also very high, more or less to the same proportion of my single-copy BUSCOs.
C:40.8%[S:39.8%,D:1.0%],F:39.2%,M:20.0%,n:2124
867 Complete BUSCOs (C)
845 Complete and single-copy BUSCOs (S)
22 Complete and duplicated BUSCOs (D)
833 Fragmented BUSCOs (F)
424 Missing BUSCOs (M)
2124 Total BUSCO groups searched
Have you experienced this before? And how to improve my BUSCOs scores for my WGS samples?
Interesting, when I try to recover loci using specific UCE probesets for beetles, I have a high recovery rate (>95%) for all my samples (transcriptomes and WGS samples).
Thanks!
How fragmented is your WGS-assembly? The BUSCO score suggests it's very fragmented. You're still seemingly identifying the same number of BUSCOs, just that they are spread out over multiple contigs (hence the fragmented annotations).
I suspect improving contiguity of your genome will help a lot. If possible, you could try to add better long-read sequencing data, Hi-C, optical mapping, etc... But I realise this may not be possible. A proof-of-principal idea could be to use a better annotated beetle genome and a synteny analysis to guide scaffolding into pseudo-chromosomes and test if BUSCO scores go up. That said, this approach introduces all the errors of the other assembly and ignores all unresolved fixed evolutionary differences among assemblies, so isn't ideal for your final release.