Which Aligners Recognize Soft-Masked Repeats In Reference Sequences?
3
9
Entering edit mode
14.2 years ago

Which aligners (long and short read) behave differently when parts of a reference/target sequence are "soft-masked", i.e. have portions in lowercase to designate repeat regions?

alignment • 12k views
ADD COMMENT
14
Entering edit mode
14.2 years ago
lh3 33k

No, do not align to masked genome for any purpose. Filter out the reads mapped to the masked region after whole-genome alignment.

ADD COMMENT
2
Entering edit mode

Masking has never been perfect and probably will never be perfect. This will lead to wrongly mapped sequences, spurious SNPs/indels calls and all sorts of problems. I cannot think a single use case when masking may lead to better outcomes. Trust me. Do not mask.

ADD REPLY
1
Entering edit mode

Yup. Do not mask. You get the most accurate alignment when you align to what is actually there. What you do not want are reads that really belong to repetitive regions being forced to align to the wrong place because you didn't provide the correct sequence for the read to align to.

bwa does not care about lowercase nucleotides.

ADD REPLY
0
Entering edit mode

What will be a difference, except for paired-ends or spliced mappings?

ADD REPLY
0
Entering edit mode

so I assume BWA does not care about lowercase nucleotides?

ADD REPLY
0
Entering edit mode

BWA always uses all bases in alignment. Again, do not mask, unless you want to play with troubles.

ADD REPLY
5
Entering edit mode
14.2 years ago

LASTZ, 'soft-masked' regions are NOT available for seeding but allow extension. It also allows you to specify a separate file for the intervals to mask (with softmask=<mask_file>).

ADD COMMENT
1
Entering edit mode
11.7 years ago

FSA also takes into account soft-masked regions when supplied with --softmasked option.

ADD COMMENT

Login before adding your answer.

Traffic: 2161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6