Question

Finding foreign sequences in an NGS experiment

0

Entering edit mode

8.0 years ago

predeus ★ 2.1k

Hello all,

I was wondering if there are any other tools for systematic discovery of foreign sequences in your NGS samples, other than VecScreen. I can see them potentially using two approaches, matching a DB of various adapters, like UniVec, or (if there is a reference genome), trying to analyze the sequences that don't align to the genome.

At any rate, I'd be grateful for any pointers.

ngs adapter blastn • 1.7k views

ADD COMMENT • link updated 8.0 years ago by Farbod ★ 3.4k • written 8.0 years ago by predeus ★ 2.1k

score 1 · Answer 1 · 2016-12-15

1

Entering edit mode

8.0 years ago

GenoMax 147k

Perhaps it would be best to do this the other way around. Use bbsplit from BBMap to bin reads for the genome you are interested in and collect the rest in a second pool. You could go after identifying what is in there if that is something you are interested in later.

@Brian also has this to remove human contamination.

There is fastq_screen. Basically you can build a db of any sequences you are interested in and screen against that.

ADD COMMENT • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

That is a very good suggestion, thank you.

ADD REPLY • link 8.0 years ago by predeus ★ 2.1k

0

Entering edit mode

Hi, having a similar problem right now, what would you recommend in case of unknown candidate organisms for contamination? Can one use bbsplit on the whole NT? Or maybe a meta-genomics approach using e.g. Megan?

ADD REPLY • link 8.0 years ago by Michael 55k

0

Entering edit mode

It sounds like you are interested in just the unknown component and/or most of the reads are unknown? Have you thought of using kraken with a custom db for the prokaryotic part? Megan can be an option too but you would be doing large scale blast searches for that.

I will tag Brian Bushnell to see if he has any other suggestions. I have used bbsplit with 20+ bacterial genomes but using NT may be stretching it :)

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

Thank you, my contaminant is an unknown eukaryote in a TSA of an insect btw. And I was thinking to work with the assembled contigs instead of the raw reads.

ADD REPLY • link 8.0 years ago by Michael 55k

0

Entering edit mode

Do you know it to be only a single species or more?

Edit: Sounds like you have tried to identify the contaminant but failed.You could use a related insect genome (if available). Not ideal but perhaps an acceptable approximation.

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

It could be a eukaryote symbiont, something between Platyhelminthes and Mollusks. That according to a single gene phylogeny of a gene that shouldn't be there in these organisms. I am guessing that these insects ingest the eggs of a flatworm-like parasite.

ADD REPLY • link 8.0 years ago by Michael 55k

0

Entering edit mode

I found my symbiont, it is a rotifer, seemingly a common contaminant of arthropods btw. To identify the source, I blasted the TSA contigs against NT, then used MEGAN on the output and examined the largest contaminant groups besides arthropods and vertebrates, and looked for a genome sequence from the identified taxon.

ADD REPLY • link 8.0 years ago by Michael 55k

score 1 · Answer 2 · 2016-12-15

1

Entering edit mode

8.0 years ago

Farbod ★ 3.4k

Dear predeus, Hi

Have you tried Kraken ?

~ Best

ADD COMMENT • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

I have not; will definitely look into it, thank you

ADD REPLY • link 8.0 years ago by predeus ★ 2.1k