How to remove contamination from NGS data
3
1
Entering edit mode
8.6 years ago
olp123 ▴ 20

Hello,

I got 2 DNA samples from bacteria of the same species sequenced using Illumina platform. Unluckily, one sample was contaminated with bacteria from the different genus bacillus. There is a clearly different peak in GC content. How can I remove the sequences which are due to contamination?

I tried to identify the contaminating seuqences by aligning my sample contigs with Bacillus contigs from database using the Mauve software. I can clearly identify large parts of the contamination but there are unaligned contigs in the end of each sequence which I do not know where they belong to. Here is the Mauve screenshot. Dark green regions are the target sequences.Light green the contamination from Bacillus. I dont know what to do with the red parts. http://s20.postimg.org/5jre11ubf/bacillus_2_3.jpg

Does anyone know how to solve the problem without sequencing again and loosing as little information as possible?

Thanks a lot.

next-gen • 4.6k views
ADD COMMENT
5
Entering edit mode
8.6 years ago
GenoMax 148k

Use BBsplit from BBMap. Provide the two (or one correct) genomes to bin the reads. You may lose some reads that will not map uniquely but that can't be helped. You could choose include them in both bins.

ADD COMMENT
4
Entering edit mode
8.6 years ago

I can recommend BBSplit: a description can be found here.

[Edit] Ninja'ed by GenoMax!

ADD COMMENT
0
Entering edit mode
8.6 years ago
Mo ▴ 920

have a look at this package in R, it showed promising results http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3843372/

ADD COMMENT

Login before adding your answer.

Traffic: 2031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6