Entering edit mode
5.0 years ago
qingtian.guan
▴
10
Hi, I have recently assembled a protist genome (species cannot be given for confidential) while BUSCO gives me a 49% completeness. I was thinking my assembly is not good enough, so I have downloaded the available genomes from NCBI and tested with BUSCO while all of the genomes from that organism give a ~46% completeness using Protists database. There is no possibility that all of the assemblies are missing the same set of genes. Has someone encountered a similar problem? Is the protists dataset good in BUSCO? Any other program suggestions for the assessment of the assembles?
Does your organism have any special genomic features or occupies a small niche? Parasites, for example, often have a reduced gene set. The same goes for other highly specialised organisms. In that case, I would not worry too much about a low busco completeness, as I would expect differences from a "normal" organism.
Also, how do the other assembly stats look like? Is N50, assembly size, etc good?
Hi Qingtian,
I guess it may be related to the reference gene set you were using. You might want to have a look at Johnson et al., 2018 where the authors used BUSCO too but with different reference gene sets to evaluate protists' genome completeness.
Hi Sishuo, thank you for your response. The paper is really useful, but I did not see they used two data sets but rather two assemble methods. I used protists' data sets for ameba as listed https://busco.ezlab.org/. I also used Aveolata but the completeness was even more pathetic.
Apologizes for confusion. I meant that they used two reference gene sets: euk and protist. I am not sure whether this was mentioned in your post. Sorry if I misunderstood your questions. Anyway, you can consider comparing your results with theirs using the same reference data set and maybe your result, say 46%, is not bad?
Hi Sishuo, Thank you very much. I can clearly see the mean value for protists is not very high as shown.
Hi Nterhoeven, thank you for your response. I am working on ameba which is found in open freshwater, with a genome roughly around 30M, so I think it has not a reduced genome. The assembly is a little fragmented for a 30M genome with N50 160K in 394 contigs, but even the best assembly which has 80 contigs has a even lower completeness.