Question

Bwa Mem Method For Mapping, With Or Without Trimming Reads?

5

Entering edit mode

11.3 years ago

J.F.Jiang ▴ 930

Hi all,

Recently, I am dealing with the exome-seq data to call variants using bwa+GATK+varscan method, which is commonly accept by researchers.

As pointed in GATK forum, we can now use bwa mem command to replace the former bwa aln + sampe step to directly map the reads to the ref genome.

Now I am told by two ways: 1) directly using mem without the trimming process, and they told me the mapping quality is good enough. 2) we should firstly run the fastqc to determine whether the reads quality is good or not, then use fastx or cutadapt tool to trim the reads for better alignment.

So, is there anyboby has came across such a issue that you can share me some tips?

Thanks!

bwa • 25k views

ADD COMMENT • link updated 4.8 years ago by Biostar 20 • written 11.3 years ago by J.F.Jiang ▴ 930

1

Entering edit mode

Trimming is generally a good idea. It makes the alignment/variation calling steps both faster and slightly more reliable.

ADD REPLY • link 11.3 years ago by Devon Ryan 105k

2

Entering edit mode

Reference? Saw a few papers go by on this, with a few recommending only light trimming. See, e.g.: http://biorxiv.org/content/biorxiv/early/2013/11/14/000422.full.pdf which is looking at transcript assembly, but recommends:

researchers interested in assembling transcriptomes de
134 novo should elect for a much more gentle quality trimming, or no trimming at all

The author of that paper goes over another study on trimming here: http://genomebio.org/is-trimming-is-beneficial-in-rna-seq/

ADD REPLY • link 11.3 years ago by brentp 24k

4

Entering edit mode

That over aggressive trimming proves deleterious isn't exactly surprising to me. The importance of trimming and how stringently one should do it is dependent on (1) the length of reads (2) the type of experiment and (3) the aligner used and its options. For example, aligners using an end-to-end alignment (e.g. bowtie2 by default) will be more susceptible to mapping errors (or producing alignments with unduly low alignment scores) due to not trimming low quality read ends or read ends containing adapter contamination. For many applications (RNAseq or SNP calling are good examples) this really won't make a big difference. I don't disagree with Matthew MacManes there and find people suggesting trimming at Phred=30 or always lopping the last 25bp of reads off to just be silly. For other applications (I'm doing a lot of bisulfite-sequencing these days), this can really screw things up (though much less so for Bison, the aligner I wrote, than Bismark). As reads get longer and we start transitioning to local alignment and downstream analyses that account for base-call quality I expect the importance of quality trimming to wain significantly.

Anyway, good to see that someone's trying to put some objective numbers to the process!

Edit: I should add that I include adapter trimming in the general step of trimming. It seems that Matthew MacManes is in favor of continued adapter trimming, though I'm sure many people take that over the top too.

ADD REPLY • link 11.3 years ago by Devon Ryan 105k

0

Entering edit mode

These are interesting links. The first paper appears to only use Trimmomatic, and the second link suggests you'll get very different results depending on the tool, so I wouldn't call this conclusive. I think the quality of the data and the research question should dictate how to treat the reads and coming up with a universal trimming rule shouldn't be the goal.

Also, if you look closely (following the second link), you'll see that mappability does decrease with moderate trimming for all except FASTX and PRINSEQ. I think the others must use the Phred algorithm, which leads to a decrease in mapping percent with increasing threshold. I would expect seqtk to display the same pattern since it uses this algorithm also.

ADD REPLY • link 11.2 years ago by SES 8.6k

Ram · Answer 1 · 2014-02-26

16

Entering edit mode

11.2 years ago

hannes.svardal ▴ 180

I just asked a similar question about quality trimming and BWA mem at the the bwa mailing list and this was Heng Li's reply:

You don’t need to do quality trimming with bwa-mem. BWA-backtrack requires reads to be mapped in full length. Low-quality tail may greatly affect its sensitivity. Bwa-mem largely does local alignment. If a tail cannot be mapped well, it will be soft clipped.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by hannes.svardal ▴ 180

0

Entering edit mode

In the original reply from Heng Li, he recommended adapter trimming.

ADD REPLY • link 5.5 years ago by samuelandjw ▴ 270

Ram · Answer 2 · 2014-01-07

3

Entering edit mode

11.3 years ago

Pavel Senin ★ 1.9k

I use bwa-mem, but not for SNPs though. I think that adapters must be cut off - it would be difficult to align reads which have adapter artifacts because the reference doesn't have those, moreover the shorter alignment will be scored less and the read may be invalidated due to mismatches (while it would be kept after proper QC). Another concern, however, is that bwa-mem designed for 70bp-Mbp reads (as they say) and after QC (cut and trimming) a number of reads may get too short...

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

No doubt, that the primers seq should be cut off. However, what I am concerning is that after the primer cutoff, should we do the reads trimming to make a better QC score and thus for the better mapping using align tools, such as bwa-mem.

For example, using fastQC, we may find the the reads after 70pos show a poor quality depend on the reads quality scores (>=20), should we remove the reads after pos. 70 using fastx?

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by J.F.Jiang ▴ 930

1

Entering edit mode

In general, there is a rule in computer science called garbage in, garbage out, suggesting that there is no silver bullet - computers (and algorithms) only capable of producing results as good as your data is. So, in general, i would clean any ambiguous base pairs. But, in any particular case (for any particular task) the threshold of ambiguity may vary. In my opinion, if you are looking for variations, and your coverage is low, having ambiguous bases may directly lead to ambiguous results of low quality.

ADD REPLY • link 11.3 years ago by Pavel Senin ★ 1.9k

score 0 · Answer 3 · 2014-02-26

0

Entering edit mode

11.2 years ago

Ashutosh Pandey 12k

Not relevant to the question but a recent paper about "An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis"

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0085024

ADD COMMENT • link 11.2 years ago by Ashutosh Pandey 12k

Ram · Answer 4 · 2014-02-26

0

Entering edit mode

11.2 years ago

Birdman ▴ 20

IMHO, trimming is a good idead a long as it's not too 'aggressive'. I used Trimmomatic and BWA-mem and I obtained good results from it. If you're using paired-end data, just make sure you use a software that will keep track of those or BWA-mem will be unable to map your reads correctly (e.g. Trimmomatic does it, FastX-toolkit does NOT).

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by Birdman ▴ 20

0

Entering edit mode

Trimmomatic beats up conventional trimming tools.

ADD REPLY • link 7.2 years ago by bounlu ▴ 270