Question

QC on Illumina Fastq files

0

Entering edit mode

8.8 years ago

indranipoddar ▴ 70

Hello,

I am interested to filter out contaminant, adapter, mitochondrial DNA in 3 separate fasta file from Illumina paired end fastq file. So my question is how to use bowtie2 to umap on multiple files to produce a single fastaq unmap file which will filter 3 of these.If the alternative is to use bowtie2 to map , one at atime on these 3 files, then how to combine 3 umap files as they may have redundant reads.

I highly appreciate any feedback or if this question is answered previously , then please share the link. Thanks, Indrani

next-gen • 2.5k views

ADD COMMENT • link updated 8.8 years ago by GenoMax 151k • written 8.8 years ago by indranipoddar ▴ 70

0

Entering edit mode

I am not sure what you mean by contaminant, but adapter and mitochondrial DNA should not map to the genome, so those will automatically be filtered out during alignment. Are you trying to remove or keep those?

ADD REPLY • link 8.8 years ago by igor 13k

0

Entering edit mode

I want to remove them. I also want to remove rRNA, polyA etc. So I want to run bowte2 single run which will produce 1 single fastq file with unmapped data filtering adpater, mitochondrial, polyA, rRNA. Thanks for your reply.

Indrani

ADD REPLY • link 8.8 years ago by indranipoddar ▴ 70

0

Entering edit mode

If this is RNAseq data then you don't need to remove anything except the adapters (those you don't need to strictly remove either but it is a good idea). See this recent discussion: Removing rRNA and tRNA sequences using GTF files

ADD REPLY • link 8.8 years ago by GenoMax 151k

0

Entering edit mode

Thanks for your reply.

ADD REPLY • link 8.8 years ago by indranipoddar ▴ 70

score 1 · Answer 1 · 2016-08-04

1

Entering edit mode

8.8 years ago

GenoMax 151k

If you want to capture mitochondrial reads away from the genome then you would want to look at BBSplit: BBSplit syntax for generating builds for the reference genome and how to call different builds.

ADD COMMENT • link 8.8 years ago by GenoMax 151k

score 0 · Answer 2 · 2016-08-04

0

Entering edit mode

8.8 years ago

Brian Bushnell 20k

Mapping is not a good method for adapter removal, as the whole read will not match the reference. I suggest using a tool designed specifically for adapter-trimming, like BBDuk. It can also be used to remove other short contaminant sequences. For sequences longer than read length (such as mitochondrial DNA), mapping is most precise. Just concatenate all contaminant references into a single fasta file, map to it, and keep the unmapped reads. This step should be done using the reads remaining after adapter-trimming and other short synthetic contamiant removal.

ADD COMMENT • link 8.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks for your reply. Does bowtie2 can create index on a file with multiple fasta sequences like you suggested to concatenate all reference sequences.

Indrani

ADD REPLY • link 8.8 years ago by indranipoddar ▴ 70

0

Entering edit mode

Yes it can and that is how you should do it.

ADD REPLY • link 8.8 years ago by GenoMax 151k