Entering edit mode
6.1 years ago
Morgan S.
▴
90
I have two fungal genomes that I blasted against the ITS, 18S, and 28S targeted loci database in NCBI. You would think that somewhere in the genome would have a hit against either of these databases, but for some reason, I am not finding any hits. If there is a hit, the query coverage is below 10%. NCBI recommended changing the expected threshold and the low complexity filtering, but this did not improve my matches.
Does anyone have any thoughts as to why this may be happening, considering my genomes are ~98% complete?
Thanks, Morgan
How do you come to the conclusion that your genome is 98% complete?
BUSCO can predict how complete a genome is based on the number of conserved orthologs.
Not exactly, BUSCO can predict the percentage found of complete and single copy genes expected for a given taxon. These single copy genes are based on the distribution of orthologs for a given taxonomic branch. Hypothetically, one can find 98% of the expected BUSCOs, and have significantly less than 98% of the genome assembled, for example, if the genome has a good proportion of repetitive material.
It is an interesting and useful quality metric, but the relation between % BUSCOs and completeness isn't linear - in fact, I think the manuscript supplementary materials has a plot with the relation between them.
I gotcha. Thanks for clarifying.
It seems you have removed contigs with 10% coverage (what is 10% coverage, by the way?). Did you perform any additional filtering? rRNA genes in general are located in contigs with very high coverage, as they have multiple copies in the genome. If you removed contigs at the extremes of coverage distribution (very high and very low coverage), you probably removed the contigs containing the rRNA genes.
I think OP means that the hits only cover 10% of the query sequence (coverage as in the old school blast era, not nowadays NGS depth ;-) )
Other than that it is possible those rRNA gene regions might be missing from the assembly (due to reason you mentioned )
Yes, you are right, I misunderstood what she said.
@ msobol , did you try making a blast database with you genome, and blasting some ITS, 18S and 28S sequences against it? You can also use RNAmmer to predict rRNA genes on the genome assembly.
*she I have not yet tried making database based on my genome. I had previously uploaded my genome to the NCBI BLAST website and blasted it that way. I am still surprised that I would not get a hit. My largest contigs are > 500,000 bp.
I will also look into RNAmmer, thanks for the suggestion!