Best way to remove contaminants to get nuclear genome
2
2
Entering edit mode
10.5 years ago


I have plant genomic reads (WGS method) through hiseq 2000 with paired-end reads (read1.fastq, read2.fastq). I need to remove chloroplast and mitchondria reads to assemble only nuclear reads. I need to know which is best and fast way to remove contaminants

1. First doing genome assembly and remove mitochondria and chloroplast reads from genome using Blast.

2. Remove first mitochondria and chloroplast reads from fastq files by mapping reads to respective mitochondria and chloroplast read using bwa and get unmapped reads as nuclear reads as fastq file using samtools and picard and then do genome assembly.

Assembly genome • 7.8k views
ADD COMMENT
0
Entering edit mode

Question in topic: do you know how NCBI filters out mitochondrial contigs from submitted nuclear genome? I couldn't dig into that info.

ADD REPLY
0
Entering edit mode

where do you get the chloroplast and mitchondria genome sequence? NCBI? Thanks,

ADD REPLY
3
Entering edit mode
10.5 years ago
JC 13k

Option 2 is better, mapping with BWA is fast, and you are reducing the total reads to assembly, therefore your assembly time and complexity is reduced.

ADD COMMENT
3
Entering edit mode
10.5 years ago
Adrian Pelin ★ 2.6k

Option #2 works if there is a good reference available that is also very similar to what you have.

I would do a genome assembly, and then use MegaBlast against nt database to find our if each of your contigs is either mitochondrial/chloroplast or just a contaminants, and these guys I would put them in a list. I would then extract all contigs from that list, and use bwa to map reads against it, and extract unaligned reads to do your nuclear genome assembly.

ADD COMMENT

Login before adding your answer.

Traffic: 2412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6