Why to filter reads before mapping (bowtie2)?
1
0
Entering edit mode
10.6 years ago
Jan Hapala • 0
  1. Why should I filter reads before mapping (with Bowtie2) besides saving computational resource / time?
  2. If I use local alignment in Bowtie2, will also reads with perfect (or very good) global match be aligned locally in addition? (I do not want to get multiple mappings just due to local aligning if a global alignment exists.)

More info:

I am analysing ChIP-seq data for Arabidopsis thaliana (i.e. small genome).

I have some reads datasets of generally good quality but some base qualities approach zero, of course.Distribution of nucleotides in some datasets also shows slight imbalance at the beginning of the reads (maybe parts of adapter or bar code sequences?).

After playing around with different ways of filtering, I would like to have all reads of the same length (better for downstream programs).

mapping bowtie2 sequencing • 6.0k views
ADD COMMENT
0
Entering edit mode

Can you give more information about the "filtering" in question #1? There are lots of ways you can filter your reads, and there are different reasons and considerations surrounding the choice of filter(s).

ADD REPLY
0
Entering edit mode

I mean filtering in general. Why not just running bowtie2 on the raw dataset? (I do not mind extra running time for the c. 10 % reads that would be discarded.) It is a ChiP-seq analysis, not SNP or enything else.

ADD REPLY
0
Entering edit mode

Well, again, it depends on the specifics of the experiment. But a lot of times you'll want to trim instead of filter. For example, trimming off low-quality bases or adapters can prevent a read from being ditched or penalized during alignment, allowing you to capture data that you would otherwise be throwing out.

ADD REPLY
0
Entering edit mode

Right, but as I wrote I prefer keeping the reads of the same length (some downstream programs might have problems with variable length of the reads). Then I would have to either trim the whole dataset (and loose some nucleotides needed for correct mapping) or discard (filter) such reads.

That is why I am thinking about combining global and local alignment (when potential adapters and low quality ends should not matter).

But it is generally recommended to do quality filtering before mapping (even in ChIP-seq guidelines) but I cannot see why. (e.g. Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data)

ADD REPLY
0
Entering edit mode

The way bowtie is written, you can't combine --end-to-end and --local modes. But as Istvan shows, it's as easy as filtering out non-primary alignments using samtools or Picard's FilterSamReads tool.

A better question when it comes to quality filtering, in my opinion, is "why not do quality filtering?" Why wouldn't you want to remove garbage reads beforehand, thus ensuring they don't end up as false positives and take up computational time? Computationally, it requires much less work to throw out a nasty read than to attempt to map it.

ADD REPLY
0
Entering edit mode

Because I do not feel good about throwing away reads that may be of a very good quality and only have poor ends or throwing away ends from all reads.

Simply: I want to maximize the outcome.

What do you mean by "you can't combine --end-to-end and --local modes"? You mean running bowtie2 twice and merging the outputs?

ADD REPLY
1
Entering edit mode

What I mean is that you can't do what you're wanting to do in a single bowtie2 run. You can't run it in --end-to-end mode and have it only report global alignments for a given read while ignoring local alignments. There are many ways around this using multiple steps, some of which you've proposed yourself, but I don't know of any way to do it in a single pass.

ADD REPLY
1
Entering edit mode
10.6 years ago
Dan D 7.4k

I'll go ahead and answer question #2, since the first question as currently stated is very open-ended and difficult to answer succinctly.

If you align in local mode, the best alignments in terms of matching accuracy and length will still receive the best score. If you want to knock out any mappings that map locally along with a global alignment, you can just filter the bam after the fact using the CIGAR values. Basically, sort the bam by read name, then go through the reads. If a read has a perfect alignment, then knock out any other reads which don't also have a perfect alignment. There might be something written already to do this, but it's not a difficult scripting task.

ADD COMMENT
1
Entering edit mode

I have nerver checked this myself but won't flag 256 also do this? If so filtering for it would be a lot simpler. Of course this requires that the aligner sets this flag.

ADD REPLY
0
Entering edit mode

Yep, you're exactly right. Removing non-primary alignments would be an excellent solution to the problem.

ADD REPLY
0
Entering edit mode

I am not sure now whether MACS filters multiple-mapped reads prior to peak calling or not. Let's say it does (or any other program). Then I could not use this approach (under assumption that the filtering makes substantial difference).

I could map reads globally first, then take those that did not map at all and map them locally and merge those two BAMs. But I hoped that bowtie2 might skip local aligning if it finds global mapping under current setting. I could not find this in the manual, though.

ADD REPLY
0
Entering edit mode

If you don't want reads with multiple perfect alignments to be tossed out, then again, use the CIGAR flags to keep only one of the mappings. There are so many possible bioinformatics approaches out there that you can't expect a tool to encompass them all. Sometimes you have to use multiple steps or write your own scripts. That's the fun part of bioinformatics!

ADD REPLY

Login before adding your answer.

Traffic: 2086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6