Benchmarking Read Alignment And Variant-Calling Algorithms (For Dummies)
2
6
Entering edit mode
13.4 years ago
Travis ★ 2.8k

Hi all,

I am wondering if there is a good step by step guide of how to benchmark alignment and variant calling software. I do understand the premise e.g.

Generate reads with known mutations
Align to genome
Assess accuracy
Perform variant calling
Assess accuracy

However I have some kind of intellectual disconnect when I try to think about how to actually do it. Too much time in industry and not enough in academia I suspect!

Can anyone point me in the right direction?

Thanks in advance!

alignment snp indel algorithm • 4.6k views
ADD COMMENT
1
Entering edit mode
13.1 years ago
Torst ▴ 980

M. Ruffalo recently published "Seal" which is an evaluation suite for read aligners.

"With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels and coverage"

Reference:

Ruffalo M, Laframboise T, Koyutürk M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011 Oct 15;27(20):2790-6. Epub 2011 Aug 19.

http://www.ncbi.nlm.nih.gov/pubmed/21856737

ADD COMMENT
0
Entering edit mode
13.4 years ago
Travis ★ 2.8k

I think I have answered the aligner part:

http://www.massgenomics.org/short-read-aligners

ADD COMMENT
5
Entering edit mode

This benchmark is flawed. All read mappers easily achieve <1% error rate on simulated data (accurate mappers <0.1%), while the 2nd plot implies something like 10%. There are also a couple of papers benchmarking the mappers, but they all have problems. The best benchmark I have seen is the one done by the 1000g project, but it is not available publicly.

ADD REPLY
0
Entering edit mode

I notice the fake reads were trained on a human sample but used to generate C. elegans reads also. Not sure if this could have affected anything though.

ADD REPLY
0
Entering edit mode

Also BFast shows as a fast aligner??

ADD REPLY
0
Entering edit mode

There is another flaw in your analysis: In essence, we have the “right” answer and can use it to determine if a read is placed correctly. You cannot conclude that an alignment is false (marked in red in your bar graphs), if a read does align in a different location than it was generated from. This in fact tells you nothing unless you prove with Smith-Waterman that the optimal local alignment doesn't pass the alignment criteria in this position. It could in fact be a duplicated region.

ADD REPLY
0
Entering edit mode

Remember that there is actually an authoritative solution which is Smith-Waterman, thus an aligner which uses Smith-Waterman as a last step should in principle yield no false positives. And the flaw of that evaluation is that it wasn't checked. Therefore, the whole analysis is flawed imho, and gives you absolutely nothing, even though it contains some nice ideas.

ADD REPLY

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6