Cross-species contamination in NGS data
3
0
Entering edit mode
8.7 years ago
T ▴ 40

Dear all,

Given that in a certain sequencing study, a big majority of the reads are not mapping to the origins of interest (human / mouse & yeast), I am looking for a tool / approach check for cross-species contaminations of the reads.

A quick BLAST of some sequences revealed some bacterial RNA, but I want to classify all of the reads. Can you recommend me a tool / approach / best practice to do this high-throughput.

I have found a few online, but as far as I can see most of them are made for bacterial metagenomic studies. Probably some of you experienced users have a quickhack or a best practice.

Thank you very much.

sequencing QC • 2.3k views
ADD COMMENT
0
Entering edit mode

Do you have some (ideally) small sent of species you want to check? If you want to check everything then you're largely restricted to blasting a smallish number of reads.

ADD REPLY
1
Entering edit mode
8.7 years ago
GenoMax 148k

In this case I suggest binning the reads you are interested in and separating the "others" into a different bin. Take a look at BBSplit from BBMap which would be perfect for this. If you are interested in finding out what the "other" bin contains you can do that separately later.

ADD COMMENT
0
Entering edit mode

Thanks, this is a very nice approach to get the "unmapped" reads.

However, the question what the "others" do would be the main point of the question.

ADD REPLY
0
Entering edit mode

So you are interested in content of "other" bin? Your original question made it sound like that was "contamination" (but not in real meaning of the word then?). Like @Devon said it would depend on if you expect only a few species to be present. Otherwise you would have to blast against refseq/bacterial in order to try and identify what is there.

ADD REPLY
1
Entering edit mode
8.7 years ago
User 59 13k

You could try Kontaminant: https://github.com/TGAC/kontaminant

ADD COMMENT
0
Entering edit mode
8.7 years ago
h.mon 35k

I would get the unmapped reads, assemble them with metaSPAdes or MEGAHIT, and use GC content, mapping coverage and Blast searches to examine the assembled contigs - a great tool for this is blobtools.

ADD COMMENT

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6