Tool:Segemehl: A Fast One-Stop-Shop Mapping Tool
3
5
Entering edit mode
12.3 years ago

segemehl is a software to map sequencer reads to reference genomes. Unlike other methods, segemehl is able to detect not only mismatches but also insertions and deletions. Furthermore, segemehl is not limited to a specific read length and is able to map primer or polyadenylation contaminated reads correctly. segemehl implements a matching strategy based on enhanced suffix arrays (ESA). Segemehl now supports the SAM format, reads gziped queries to save both disk and memory space and allows bisulfite sequencing mapping and split read mapping.

  • adapter prediction and/or clipping
  • mapping of single-end or paired-end data
  • mapping with mismatches, insertions and deletions
  • returning of all multiple mapping loci of one read (report only best scoring hits or all mappings with a set accuracy)
  • multiple split read mapping (and downstream splice site detection)
  • bisulfite mapping
  • multithreading

For more information see: http://hoffmann.bioinf.uni-leipzig.de/LIFE/segemehl.html

Publication:

Hoffmann S, Otto C, Kurtz S, Sharma CM, Khaitovich P, Vogel J, Stadler PF, Hackermueller J: "Fast mapping of short sequences with mismatches, insertions and deletions using index structures", PLoS Comput Biol (2009) vol. 5 (9) pp. e1000502

mapping next-gen • 12k views
ADD COMMENT
2
Entering edit mode

Interesting - I'm planning to do a "shootout" with aligners as well - I'll add this one to the list

ADD REPLY
1
Entering edit mode

I would be highly interested in the outcome! Let me know, once you have results!

ADD REPLY
0
Entering edit mode

What is the license of SEGEMEHL? GPL?

ADD REPLY
0
Entering edit mode

As far as I know, there is no license for segemehl yet. They just write: "...free software for non-commercial use..."

ADD REPLY
2
Entering edit mode

I've benchmarked Segemehl with BWA-MEM, Bowtie2 and MOSAIK and found that for datasets with a lot of variation it maps more reads and with significantly greater accuracy. However, this is using default parameters, and I found that BWA responds better than Segemehl optimising mapping sensitivity in reads with high variation. In fact it generally outperforms Segemehl in terms of looser definitions of accuracy while running faster and using less memory. Yet such parameter optimisation is a pain, and people generally run these tools using defaults. Segemehl is better out of the box at exactly calling indels, although is quite a lot slower. See figure below (using CuReSimEval strict mapping definition).

ADD REPLY
0
Entering edit mode

Did you also try to optimize segemehl for your dataset, or just BWA? ;)

ADD REPLY
0
Entering edit mode

Hi David, I've only just seen your reply - apologies! It was a while ago, but I did struggle with optimising Segemehl for our data through parameter sweeps. It just didn't seem to improve our results. It would be arrogant of me to suggest that our benchmark criteria were definitely not to blame for this result.

What parameters might you suggest for sensitively mapping high diversity (indel and mismatch) sequences?

ADD REPLY
4
Entering edit mode
12.3 years ago

A performance plot from the paper:

< image not found >

ADD COMMENT
2
Entering edit mode

That's impressive

ADD REPLY

Login before adding your answer.

Traffic: 2170 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6