Question

Fragmented BUSCOs of genome assemblies - How to improve scores?

0

Entering edit mode

3 months ago

neolycus • 0

Hello there,

I am new to BUSCO, and I am having a hard time interpreting some of the results that I am having.

The command that I used for my search is below:

busco -i assemblies -l endopterygota_odb10 -o batchtest -m genome -c 32 -f --metaeuk

I am using assembly files based on WGS and transcriptomes for a group of beetles. I noticed that all my transcriptome assembly files are quite small (100 mb each) and all my WGS-based assembly files are quite large (>1GB each).

All BUSCOs from my transcriptome samples are quite good, see below a sample:

***** Results: *****

C:93.2%[S:92.1%,D:1.1%],F:2.6%,M:4.2%,n:2124       
1981    Complete BUSCOs (C)            
1957    Complete and single-copy BUSCOs (S)    
24  Complete and duplicated BUSCOs (D)     
55  Fragmented BUSCOs (F)              
88  Missing BUSCOs (M)             
2124    Total BUSCO groups searched

However, ALL BUSCOs from my WGS samples are quite low (see below). I noticed that the number of fragmented BUSCOs on my WGS samples are also very high, more or less to the same proportion of my single-copy BUSCOs.

C:40.8%[S:39.8%,D:1.0%],F:39.2%,M:20.0%,n:2124     
867 Complete BUSCOs (C)            
845 Complete and single-copy BUSCOs (S)    
22  Complete and duplicated BUSCOs (D)     
833 Fragmented BUSCOs (F)              
424 Missing BUSCOs (M)             
2124    Total BUSCO groups searched

Have you experienced this before? And how to improve my BUSCOs scores for my WGS samples?

Interesting, when I try to recover loci using specific UCE probesets for beetles, I have a high recovery rate (>95%) for all my samples (transcriptomes and WGS samples).

Thanks!

fragmented BUSCOs • 724 views

ADD COMMENT • link updated 3 months ago by young_bioinformatician ▴ 240 • written 3 months ago by neolycus • 0

0

Entering edit mode

How fragmented is your WGS-assembly? The BUSCO score suggests it's very fragmented. You're still seemingly identifying the same number of BUSCOs, just that they are spread out over multiple contigs (hence the fragmented annotations).

I suspect improving contiguity of your genome will help a lot. If possible, you could try to add better long-read sequencing data, Hi-C, optical mapping, etc... But I realise this may not be possible. A proof-of-principal idea could be to use a better annotated beetle genome and a synteny analysis to guide scaffolding into pseudo-chromosomes and test if BUSCO scores go up. That said, this approach introduces all the errors of the other assembly and ignores all unresolved fixed evolutionary differences among assemblies, so isn't ideal for your final release.

ADD REPLY • link 3 months ago by dthorbur ★ 2.5k

score 0 · Answer 1 · 2024-08-20

0

Entering edit mode

3 months ago

shelkmike ★ 1.4k

The genome assembly is probably fragmented because you used only short (Illumina or MGI) reads. Repeats in introns complicate assembly from short reads which leads to fragmented genes. If you want to make a better assembly, use long (Oxford Nanopore or PacBio) reads.
What is the N50 of your genome assembly?

ADD COMMENT • link 3 months ago by shelkmike ★ 1.4k

0

Entering edit mode

THe BUSCO score is only one of many QC parameters. I agree that more information on QC (N50, #contigs, assembly size vs. exp. genome size) and also how you assembled the transcriptome and genome should be provided.

ADD REPLY • link 3 months ago by michael.ante ★ 3.9k

score 0 · Answer 2 · 2024-08-21

You might either increase your sequencing depth or use longer reads. However, If you don't want to back to experiment parts, then there are some studies that improve BUSCOs result with bioinformatic solution. So the aim should be to improve the assembly quality so that means you can cover more BUSCOs when you have more compact assembly.

Here there is a study that you can read and follow the idea.

A combined de novo assembly approach increases the quality of prokaryotic draft genomes