Tool:Bwa-Meth: Align And Tabulate Bs-Seq Reads
2
19
Entering edit mode
10.8 years ago
brentp 24k

bwa-meth is available from: https://github.com/brentp/bwa-meth

There are many existing BS-Seq aligners. I wrote bwa-meth because of problems with existing aligners such as:

  1. writing large temp files of the C -> T converted data
  2. outputting alignments that aren't usable by existing tools
  3. difficult installation and use.

bwa-meth resolves those problems and uses bwa-mem to do all of the hard work. bwa-meth simply streams C -> T converted reads to the transformed reference and adjusts the alignment that bwa mem reports.

This happens to work quite well in terms of speed and accuracy. There is a comparison of bwa-meth with other aligners on the page above and more information in the manuscript: http://arxiv.org/abs/1401.1129

Comments and feedback are welcome. Usage looks like

bwameth.py index $REF
bwameth.py --reference $REF some_R1.fastq.gz some_R2.fastq.gz --prefix some.output

And the alignments will appear in some.output.bam

It requires python2 or python3, bwa, and samtools

methylation • 12k views
ADD COMMENT
0
Entering edit mode

I saw this project when you created the GitHub repo, and it also popped up in my Arxiv feed. Your code looks pretty straight-forward and solid, and I like the idea a lot. Most of the existing tools have serious implementation issues, so it's good to see this one set the bar a bit higher.

ADD REPLY
0
Entering edit mode

brentp, thanks for great the tool!

Wanted to ask: is reference "normal" ACGT genome or is reference converted C->T genome?

I get different mapping results (much better when using CT converted genome), any ideas how can this happen?

--

Edit

When using "normal" genome I map only those reads that have methylated Cs. Can this be bwa mismatch criteria problem?

ADD REPLY
9
Entering edit mode
10.5 years ago
brentp 24k

The bwa-meth paper has been rejected from bioinformatics after the 2nd revision. There was some concern about how I did the comparisons [either inadequate or unfair], for example it was suggested that one was able to get more reads to align using default parameters for bsmap. This is true, see the image at the end of this post where the red dot is the default parameters and the blue is the parameters I chose. Unfortunately, all of the additional alignments with default params are incorrect--this is for simulated data, so it is possible to know the additional reads were aligned off-target.

I think there is value in the comparison presented in the bwa-meth paper, and to my knowledge, it is the most thorough evaluation of BS-Seq read mappers and the only one on genome-wide real data. Regardless of whether you should choose to use bwa-meth, there are some important points (the 2nd is implicit, but you can find it in the code and more on that below):

  1. If you use a bowtie (1 or 2) backed aligner, you must quality-trim your reads.
  2. There is great variability in the work required to post-process the output from these into usable data.
  3. Number of reads mapped is not a useful metric.

There are a number of things I didn't write in the paper, but that I think are worth mentioning, most are somehow related to the 2nd point above. For example, the output from bsmooth comes in a watson bam and a crick bam, I did coordinate flipping, but it's not clear to me the full set of steps that would be required to make the crick bam usable. (See dpryan's take on it, and my handling of it). For LAST, which, as you can see performs very well, I had to remove the requirement that the reads are paired since it does not set that flag for a lot of true pairs--clearly this could be fixed, but it is an issue. LAST prefers to output MAF, so it is likely that users who perfer that format will have no trouble. For the most part, I tend to use BAM. Since I started writing bwa-meth (which is now some time ago), bismark has greatly improved, adding sorted header output, compressed and BAM output, and mapping quality scores in the most recent version released in the last few days. Bismark, BSMAP, and GSNAP were the aligners that just worked. I am not sure that I have represented bison in its best light as Devon Ryan has shown it to be quite good on the real dataset used in the paper.

Finding the right parameters for 8 aligners is hard, I attempted to find the best case for each aligner (figure below is case-in-point). I welcome any changes in that regard, one advantage of keeping the paper on arXiv is that I can continue to make updates. Although I did a number of searches for the best parameters for other aligners (see this script where I gridded parameters for bismark+bowtie2 to find the best performing), I actually did very little optimization on bwa-meth (thanks to Heng Li for making great software). You can see in the code that I change the parameters from the default, but using the defaults makes a small enough difference that it is probably not noticeable in the ROC plots.

I have asked the editors of bioinformatics if I can make the reviews public as I think they are of interest. Just as it's difficult to do the comparison fairly, it's difficult to review this type of paper, not having an open dialog with reviewers makes this intractable.

The comparisons that I did are reproducible. The entire code is available on github in the bwa-meth repo. For example, you can see the change I made to the code to compare bsmap parameters in the image below in this revision. If you see any errors or any way to improve the performance of the aligners open an issue or send me an email. Oh, and same if you see a way to improve the bwa-meth.

ADD COMMENT
1
Entering edit mode

Boo for rejection, but major props for the transparent and reproducible analysis. The paper will find a good home soon, I'm sure.

ADD REPLY
0
Entering edit mode

it will have its home on arXiv for the foreseeable future.

ADD REPLY
0
Entering edit mode

Heng's bwa-mem had very similar fate, reviewers did not like some of the comparisons

ADD REPLY
0
Entering edit mode

Yes. Another of the reviewers commented that all of the benefit of using bwa-meth comes from using the local alignment via bwa-mem. I completely agree.

ADD REPLY
0
Entering edit mode

Brent, thanks for bwa-meth its really a great! I would like to also mention that the performance of BWA mem improves with increasing read length, as opposed to Bowtie2 based tools. Your paper could benefit by highlighting that bwa-meth is head-and-shoulders the best with reads >=200 nt length. You might be able to simulate merged paired end RapidRun reads that could be up to 500 bp in length.

ADD REPLY
5
Entering edit mode
10.7 years ago
brentp 24k

As part of the paper for this, I compared to a number of existing aligners. A summary of the results on a real dataset with trimmed and un-trimmed reads is captured in the gif below. It shows that most aligners benefit from quality trimming but that bwa-meth performs well on untrimmed reads. Last also performs well on un-trimmed reads.

ADD COMMENT

Login before adding your answer.

Traffic: 1735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6