Low BUSCO completeness value for protists
0
0
Entering edit mode
5.0 years ago

Hi, I have recently assembled a protist genome (species cannot be given for confidential) while BUSCO gives me a 49% completeness. I was thinking my assembly is not good enough, so I have downloaded the available genomes from NCBI and tested with BUSCO while all of the genomes from that organism give a ~46% completeness using Protists database. There is no possibility that all of the assemblies are missing the same set of genes. Has someone encountered a similar problem? Is the protists dataset good in BUSCO? Any other program suggestions for the assessment of the assembles?

Assembly next-gen sequencing assembly • 2.1k views
ADD COMMENT
1
Entering edit mode

Does your organism have any special genomic features or occupies a small niche? Parasites, for example, often have a reduced gene set. The same goes for other highly specialised organisms. In that case, I would not worry too much about a low busco completeness, as I would expect differences from a "normal" organism.

Also, how do the other assembly stats look like? Is N50, assembly size, etc good?

ADD REPLY
0
Entering edit mode

Hi Qingtian,

I guess it may be related to the reference gene set you were using. You might want to have a look at Johnson et al., 2018 where the authors used BUSCO too but with different reference gene sets to evaluate protists' genome completeness.

ADD REPLY
0
Entering edit mode

Hi Sishuo, thank you for your response. The paper is really useful, but I did not see they used two data sets but rather two assemble methods. I used protists' data sets for ameba as listed https://busco.ezlab.org/. I also used Aveolata but the completeness was even more pathetic.

ADD REPLY
0
Entering edit mode

Apologizes for confusion. I meant that they used two reference gene sets: euk and protist. I am not sure whether this was mentioned in your post. Sorry if I misunderstood your questions. Anyway, you can consider comparing your results with theirs using the same reference data set and maybe your result, say 46%, is not bad?

the differences in ORF content and BUSCO matches compared to eukaryotic (Fig. 5) and protistan (Supplementary Fig. S3 [43]) databases,

ADD REPLY
0
Entering edit mode

Hi Sishuo, Thank you very much. I can clearly see the mean value for protists is not very high as shown.

ADD REPLY
0
Entering edit mode

Hi Nterhoeven, thank you for your response. I am working on ameba which is found in open freshwater, with a genome roughly around 30M, so I think it has not a reduced genome. The assembly is a little fragmented for a 30M genome with N50 160K in 394 contigs, but even the best assembly which has 80 contigs has a even lower completeness.

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6