Question

Tool:Segemehl: A Fast One-Stop-Shop Mapping Tool

5

Entering edit mode

12.8 years ago

David Langenberger 11k

segemehl is a software to map sequencer reads to reference genomes. Unlike other methods, segemehl is able to detect not only mismatches but also insertions and deletions. Furthermore, segemehl is not limited to a specific read length and is able to map primer or polyadenylation contaminated reads correctly. segemehl implements a matching strategy based on enhanced suffix arrays (ESA). Segemehl now supports the SAM format, reads gziped queries to save both disk and memory space and allows bisulfite sequencing mapping and split read mapping.

adapter prediction and/or clipping
mapping of single-end or paired-end data
mapping with mismatches, insertions and deletions
returning of all multiple mapping loci of one read (report only best scoring hits or all mappings with a set accuracy)
multiple split read mapping (and downstream splice site detection)
bisulfite mapping
multithreading

For more information see: http://hoffmann.bioinf.uni-leipzig.de/LIFE/segemehl.html

Publication:

Hoffmann S, Otto C, Kurtz S, Sharma CM, Khaitovich P, Vogel J, Stadler PF, Hackermueller J: "Fast mapping of short sequences with mismatches, insertions and deletions using index structures", PLoS Comput Biol (2009) vol. 5 (9) pp. e1000502

mapping next-gen • 13k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 12.8 years ago by David Langenberger 11k

2

Entering edit mode

Interesting - I'm planning to do a "shootout" with aligners as well - I'll add this one to the list

ADD REPLY • link 12.8 years ago by Istvan Albert 102k

1

Entering edit mode

I would be highly interested in the outcome! Let me know, once you have results!

ADD REPLY • link 12.8 years ago by David Langenberger 11k

0

Entering edit mode

What is the license of SEGEMEHL? GPL?

ADD REPLY • link 10.9 years ago by enxxx23 ▴ 310

0

Entering edit mode

As far as I know, there is no license for segemehl yet. They just write: "...free software for non-commercial use..."

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by David Langenberger 11k

2

Entering edit mode

I've benchmarked Segemehl with BWA-MEM, Bowtie2 and MOSAIK and found that for datasets with a lot of variation it maps more reads and with significantly greater accuracy. However, this is using default parameters, and I found that BWA responds better than Segemehl optimising mapping sensitivity in reads with high variation. In fact it generally outperforms Segemehl in terms of looser definitions of accuracy while running faster and using less memory. Yet such parameter optimisation is a pain, and people generally run these tools using defaults. Segemehl is better out of the box at exactly calling indels, although is quite a lot slower. See figure below (using CuReSimEval strict mapping definition).

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.4 years ago by bedeabc ▴ 110

0

Entering edit mode

Did you also try to optimize segemehl for your dataset, or just BWA? ;)

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.4 years ago by David Langenberger 11k

0

Entering edit mode

Hi David, I've only just seen your reply - apologies! It was a while ago, but I did struggle with optimising Segemehl for our data through parameter sweeps. It just didn't seem to improve our results. It would be arrogant of me to suggest that our benchmark criteria were definitely not to blame for this result.

What parameters might you suggest for sensitively mapping high diversity (indel and mismatch) sequences?

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.1 years ago by bedeabc ▴ 110

Ram · Answer 1 · 2012-09-26

4

Entering edit mode

12.8 years ago

David Langenberger 11k

A performance plot from the paper:

< image not found >

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 12.8 years ago by David Langenberger 11k

2

Entering edit mode

That's impressive

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 12.8 years ago by Istvan Albert 102k

Ram · Answer 2 · 2014-03-17

3

Entering edit mode

11.4 years ago

David Langenberger 11k

segemehl 2.0:

Christian Otto, Peter F. Stadler, and Steve Hoffmann: 'Lacking alignments? The next generation sequencing mapper segemehl revisited', Bioinformatics 2014 : btu146v1-btu146 (2014)

< image not found >

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.4 years ago by David Langenberger 11k

Ram · Answer 3 · 2014-02-10

Some news about the segemehl algorithm:

Steve Hoffmann, Christian Otto, Gero Doose, Andrea Tanzer, David Langenberger, Sabina Christ, Manfred Kunz, Lesca Holdt, Daniel Teupser, Jöerg Hackermüeller and Peter F Stadler: 'A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and fusion detection', Genome Biology, 15:R34, doi:10.1186/gb-2014-15-2-r34 (2014)