Question

Best way to remove contaminants to get nuclear genome

2

Entering edit mode

11.0 years ago

williamjohn360 ▴ 90

I have plant genomic reads (WGS method) through hiseq 2000 with paired-end reads (read1.fastq, read2.fastq). I need to remove chloroplast and mitchondria reads to assemble only nuclear reads. I need to know which is best and fast way to remove contaminants

1. First doing genome assembly and remove mitochondria and chloroplast reads from genome using Blast.

2. Remove first mitochondria and chloroplast reads from fastq files by mapping reads to respective mitochondria and chloroplast read using bwa and get unmapped reads as nuclear reads as fastq file using samtools and picard and then do genome assembly.

Assembly genome • 8.1k views

ADD COMMENT • link updated 11.0 years ago by Adrian Pelin ★ 2.7k • written 11.0 years ago by williamjohn360 ▴ 90

0

Entering edit mode

Question in topic: do you know how NCBI filters out mitochondrial contigs from submitted nuclear genome? I couldn't dig into that info.

ADD REPLY • link 9.1 years ago by Pawel Osipowski ▴ 20

0

Entering edit mode

where do you get the chloroplast and mitchondria genome sequence? NCBI? Thanks,

ADD REPLY • link 7.7 years ago by fufuyou ▴ 110

3

Entering edit mode

11.0 years ago

Adrian Pelin ★ 2.7k

Option #2 works if there is a good reference available that is also very similar to what you have.

I would do a genome assembly, and then use MegaBlast against nt database to find our if each of your contigs is either mitochondrial/chloroplast or just a contaminants, and these guys I would put them in a list. I would then extract all contigs from that list, and use bwa to map reads against it, and extract unaligned reads to do your nuclear genome assembly.

ADD COMMENT • link 11.0 years ago by Adrian Pelin ★ 2.7k

score 3 · Accepted Answer · 2014-05-29

3

Entering edit mode

11.0 years ago

JC 13k

Option 2 is better, mapping with BWA is fast, and you are reducing the total reads to assembly, therefore your assembly time and complexity is reduced.

ADD COMMENT • link 11.0 years ago by JC 13k