Entering edit mode
2.2 years ago
gubrins
▴
350
Heys,
I am starting to work on the world of genome assembly and I was wondering if you could recommend me any software to decontaminate my assembly. I have been trying blobtools but I did not manage.
Thanks in advance!
Past thread that may be of interest: How to get a genome decontaminated after assemble
if you want to eliminate bacterial from eukaryotic contigs, you could try to plot the GC% per contig, bacterial contigs will have a quite deviating GC percentage from the eukaryotic ones.
You mean decontaminate by removing bacterial contigs from a eukaryote assembly?
I think so, as I mentioned I am new to this world of genome assembly and I've been reading it is important to decontaminate the assembly. I understand it has to be from bacterial DNA. However, not all pipelines contains this decontamination step, as for example the VGP pipeline does not mention anything about it: https://training.galaxyproject.org/training-material//topics/assembly/tutorials/vgp_genome_assembly/tutorial.html#genome-profile-analysis.
Another question, we'll be removing the mitochondrial genome with the decontamination step?
First, the mitochondrial genome may have been already sequenced, then you can easily compare. If the filter relies on nucleotide sequences and RNA-seq data you should be safe because the nucleotide sequence should be sufficiently different from bacterial DNA and it also could have good RNA-seq coverage. So, I don't think that is a problem.
The related question linked also mentions BlobTools. Could you be more specific about what is not working?
You could certainly also cook up your own strategy but then you might be attracted towards the two extremes of overfiltering (like, remove all contigs with a single bacterial blastx hit at E-value < 10, yes there are seemingly people doing that) and under-filtering.
thanks for pointing that out, yesterday I wrote an issue in blobtoolkit's github about it:
https://github.com/blobtoolkit/blobtoolkit/issues/93
I think I am having problems when downloading the nt database.It looks like your blast-db has not been downloaded correctly. You need to download the NT database using the update_blastdb.pl tool that comes with NCBI blast to avoid a mess. In my understanding blobtools integrates and summarises the output of different tools, but you have to run these correctly yourself. If you need help with downloading blastdbs, search for "download blast database" here or see the blastdb tag.