QC on Illumina Fastq files
2
0
Entering edit mode
8.3 years ago

Hello,

I am interested to filter out contaminant, adapter, mitochondrial DNA in 3 separate fasta file from Illumina paired end fastq file. So my question is how to use bowtie2 to umap on multiple files to produce a single fastaq unmap file which will filter 3 of these.If the alternative is to use bowtie2 to map , one at atime on these 3 files, then how to combine 3 umap files as they may have redundant reads.

I highly appreciate any feedback or if this question is answered previously , then please share the link. Thanks, Indrani

next-gen • 2.4k views
ADD COMMENT
0
Entering edit mode

I am not sure what you mean by contaminant, but adapter and mitochondrial DNA should not map to the genome, so those will automatically be filtered out during alignment. Are you trying to remove or keep those?

ADD REPLY
0
Entering edit mode

I want to remove them. I also want to remove rRNA, polyA etc. So I want to run bowte2 single run which will produce 1 single fastq file with unmapped data filtering adpater, mitochondrial, polyA, rRNA. Thanks for your reply.

Indrani

ADD REPLY
0
Entering edit mode

If this is RNAseq data then you don't need to remove anything except the adapters (those you don't need to strictly remove either but it is a good idea). See this recent discussion: Removing rRNA and tRNA sequences using GTF files

ADD REPLY
0
Entering edit mode

Thanks for your reply.

ADD REPLY
1
Entering edit mode
8.3 years ago
GenoMax 147k

If you want to capture mitochondrial reads away from the genome then you would want to look at BBSplit: BBSplit syntax for generating builds for the reference genome and how to call different builds.

ADD COMMENT
0
Entering edit mode
8.3 years ago

Mapping is not a good method for adapter removal, as the whole read will not match the reference. I suggest using a tool designed specifically for adapter-trimming, like BBDuk. It can also be used to remove other short contaminant sequences. For sequences longer than read length (such as mitochondrial DNA), mapping is most precise. Just concatenate all contaminant references into a single fasta file, map to it, and keep the unmapped reads. This step should be done using the reads remaining after adapter-trimming and other short synthetic contamiant removal.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. Does bowtie2 can create index on a file with multiple fasta sequences like you suggested to concatenate all reference sequences.

Indrani

ADD REPLY
0
Entering edit mode

Yes it can and that is how you should do it.

ADD REPLY

Login before adding your answer.

Traffic: 1908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6