Hello all,
I was wondering if there are any other tools for systematic discovery of foreign sequences in your NGS samples, other than VecScreen. I can see them potentially using two approaches, matching a DB of various adapters, like UniVec, or (if there is a reference genome), trying to analyze the sequences that don't align to the genome.
At any rate, I'd be grateful for any pointers.
That is a very good suggestion, thank you.
Hi, having a similar problem right now, what would you recommend in case of unknown candidate organisms for contamination? Can one use bbsplit on the whole NT? Or maybe a meta-genomics approach using e.g. Megan?
It sounds like you are interested in just the unknown component and/or most of the reads are unknown? Have you thought of using
kraken
with a custom db for the prokaryotic part? Megan can be an option too but you would be doing large scale blast searches for that.I will tag Brian Bushnell to see if he has any other suggestions. I have used
bbsplit
with 20+ bacterial genomes but using NT may be stretching it :)Thank you, my contaminant is an unknown eukaryote in a TSA of an insect btw. And I was thinking to work with the assembled contigs instead of the raw reads.
Do you know it to be only a single species or more?
Edit: Sounds like you have tried to identify the contaminant but failed.You could use a related insect genome (if available). Not ideal but perhaps an acceptable approximation.
It could be a eukaryote symbiont, something between Platyhelminthes and Mollusks. That according to a single gene phylogeny of a gene that shouldn't be there in these organisms. I am guessing that these insects ingest the eggs of a flatworm-like parasite.
I found my symbiont, it is a rotifer, seemingly a common contaminant of arthropods btw. To identify the source, I blasted the TSA contigs against NT, then used MEGAN on the output and examined the largest contaminant groups besides arthropods and vertebrates, and looked for a genome sequence from the identified taxon.