Question

Which Aligners Recognize Soft-Masked Repeats In Reference Sequences?

9

Entering edit mode

14.1 years ago

Jeremy Leipzig 22k

Which aligners (long and short read) behave differently when parts of a reference/target sequence are "soft-masked", i.e. have portions in lowercase to designate repeat regions?

alignment • 12k views

ADD COMMENT • link updated 11.7 years ago by Ali R. Vahdati ▴ 190 • written 14.1 years ago by Jeremy Leipzig 22k

score 14 · Answer 1 · 2010-10-29

14

Entering edit mode

14.1 years ago

lh3 33k

No, do not align to masked genome for any purpose. Filter out the reads mapped to the masked region after whole-genome alignment.

ADD COMMENT • link 14.1 years ago by lh3 33k

2

Entering edit mode

Masking has never been perfect and probably will never be perfect. This will lead to wrongly mapped sequences, spurious SNPs/indels calls and all sorts of problems. I cannot think a single use case when masking may lead to better outcomes. Trust me. Do not mask.

ADD REPLY • link 14.1 years ago by lh3 33k

1

Entering edit mode

Yup. Do not mask. You get the most accurate alignment when you align to what is actually there. What you do not want are reads that really belong to repetitive regions being forced to align to the wrong place because you didn't provide the correct sequence for the read to align to.

bwa does not care about lowercase nucleotides.

ADD REPLY • link 11.7 years ago by swbarnes2 14k

0

Entering edit mode

What will be a difference, except for paired-ends or spliced mappings?

ADD REPLY • link 14.1 years ago by Darked89 4.7k

0

Entering edit mode

so I assume BWA does not care about lowercase nucleotides?

ADD REPLY • link 14.1 years ago by Jeremy Leipzig 22k

0

Entering edit mode

BWA always uses all bases in alignment. Again, do not mask, unless you want to play with troubles.

ADD REPLY • link 14.1 years ago by lh3 33k

score 5 · Answer 2 · 2010-10-29

5

Entering edit mode

14.1 years ago

Haibao Tang 3.0k

LASTZ, 'soft-masked' regions are NOT available for seeding but allow extension. It also allows you to specify a separate file for the intervals to mask (with softmask=<mask_file>).

ADD COMMENT • link 14.1 years ago by Haibao Tang 3.0k

Ram · Answer 3 · 2013-04-03

1

Entering edit mode

11.7 years ago

Ali R. Vahdati ▴ 190

FSA also takes into account soft-masked regions when supplied with --softmasked option.

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 11.7 years ago by Ali R. Vahdati ▴ 190