Question

Unexpected results when alignment FASTQ files with different read alignments

0

Entering edit mode

2.7 years ago

ManuelDB ▴ 110

I am running different read alignments and these are the results I have got.

The columns are reference sequence name, sequence length, # mapped read-segments and # unmapped read-segments

enter image description here

The code used to execute the programs is without any option applied. Just as show in the synopsis of their respectively documentation

Someone could explain why being BWA-SW similar to BWA-MEM, results are so different in term of execution time and number of unmapped reads? Based on documentation, I was expecting a long execution time for BWA-backtrack as my read are around 150bp. However, this was 1 second faster.

Bowtie2 BWE • 818 views

ADD COMMENT • link 2.7 years ago by ManuelDB ▴ 110

score 2 · Accepted Answer · 2022-03-29

2

Entering edit mode

2.7 years ago

Istvan Albert 101k

The algorithms are completely different, and the implementation of each was done at different times and in some cases by different people. Evidently, there should be differences. In fact, my interpretation is the opposite of yours, I'm surprised to see how close the numbers are. All within some fraction of a percent.

All short reads aligners rely on various heuristics. None will guarantee optimality with 100% performance, just that they will find almost all hits.

As for differences, what you see is that when an aligner tries harder it finds more hits, the question is whether it is worth doubling the runtime to find 0.1% more hits.

Finally, I believe that it is not informative to compare aligners with default parameters. For example bwa and bowtie2 will do radically different things by default. If you don't see that here that's just because the data is not noisy enough. Give it data with more mismatches/errors and you'll see completely different characteristics.

ADD COMMENT • link 2.7 years ago by Istvan Albert 101k

0

Entering edit mode

Thanks for your answer.

I think there is something wrong in these data or something I am missing. I have run samtools flagstat on these BAM files and the results do not match with the table I have generated. The % mapped reads are (in the same order

 99.83%                   99.73%                 98.48%                   99.33%

All match apart from BWA-SW. Why percentage of mapped is lower when there is 0 unmapped reads in the table I generated with samtools idxstats??? I expected 100??

ADD REPLY • link 2.7 years ago by ManuelDB ▴ 110

1

Entering edit mode

reproducing statistics is always a mystery - all it takes just one slight corner case to throw things off, a bug etc.

I would not worry too much as long as the results are close, bwasw as far as I know is an obsolete methodology anyway, why bother spending effort there. One way to get the odd result is that unmapped reads are reported as secondary alignments. After all every read could align in a secondary fashion.

the rabbit holes are always deep, not worth going down unless it is part of the main analysis

ADD REPLY • link 2.7 years ago by Istvan Albert 101k

1

Entering edit mode

I found the problem.At the end of the results provided by idxstats, unmapped reads with not chr identified are shown. I missed this last line. And there is the small proportion of unmapped read that was found by flagstat

ADD REPLY • link 2.7 years ago by ManuelDB ▴ 110