Question

Low BUSCO completeness value for protists

0

Entering edit mode

5.5 years ago

qingtian.guan ▴ 10

Hi, I have recently assembled a protist genome (species cannot be given for confidential) while BUSCO gives me a 49% completeness. I was thinking my assembly is not good enough, so I have downloaded the available genomes from NCBI and tested with BUSCO while all of the genomes from that organism give a ~46% completeness using Protists database. There is no possibility that all of the assemblies are missing the same set of genes. Has someone encountered a similar problem? Is the protists dataset good in BUSCO? Any other program suggestions for the assessment of the assembles?

Assembly next-gen sequencing assembly • 2.3k views

ADD COMMENT • link updated 5.5 years ago by GenoMax 151k • written 5.5 years ago by qingtian.guan ▴ 10

1

Entering edit mode

Does your organism have any special genomic features or occupies a small niche? Parasites, for example, often have a reduced gene set. The same goes for other highly specialised organisms. In that case, I would not worry too much about a low busco completeness, as I would expect differences from a "normal" organism.

Also, how do the other assembly stats look like? Is N50, assembly size, etc good?

ADD REPLY • link 5.5 years ago by nterhoeven ▴ 120

0

Entering edit mode

Hi Qingtian,

I guess it may be related to the reference gene set you were using. You might want to have a look at Johnson et al., 2018 where the authors used BUSCO too but with different reference gene sets to evaluate protists' genome completeness.

ADD REPLY • link 5.5 years ago by Sishuo Wang ▴ 230

0

Entering edit mode

Hi Sishuo, thank you for your response. The paper is really useful, but I did not see they used two data sets but rather two assemble methods. I used protists' data sets for ameba as listed https://busco.ezlab.org/. I also used Aveolata but the completeness was even more pathetic.

ADD REPLY • link 5.5 years ago by qingtian.guan ▴ 10

0

Entering edit mode

Apologizes for confusion. I meant that they used two reference gene sets: euk and protist. I am not sure whether this was mentioned in your post. Sorry if I misunderstood your questions. Anyway, you can consider comparing your results with theirs using the same reference data set and maybe your result, say 46%, is not bad?

the differences in ORF content and BUSCO matches compared to eukaryotic (Fig. 5) and protistan (Supplementary Fig. S3 [43]) databases,

ADD REPLY • link 5.5 years ago by Sishuo Wang ▴ 230

0

Entering edit mode

Hi Sishuo, Thank you very much. I can clearly see the mean value for protists is not very high as shown.

ADD REPLY • link 5.5 years ago by qingtian.guan ▴ 10

0

Entering edit mode

Hi Nterhoeven, thank you for your response. I am working on ameba which is found in open freshwater, with a genome roughly around 30M, so I think it has not a reduced genome. The assembly is a little fragmented for a 30M genome with N50 160K in 394 contigs, but even the best assembly which has 80 contigs has a even lower completeness.

ADD REPLY • link 5.5 years ago by qingtian.guan ▴ 10