Question

Hard-masked or soft-masked genome

3

Entering edit mode

4.8 years ago

bioinf2305 ▴ 30

Which strategy of repeat masking should be preferred before gene prediction using Pasa, Augustus, Snap and Genescan? Should I carry out soft masking the genome or hardmasking the genome? I read that Augustus prefer soft masking but not sure about other gene predicting tools.

Repeatmasking Gene prediction • 8.3k views

ADD COMMENT • link updated 2.4 years ago by sunnykevin97 ▴ 990 • written 4.8 years ago by bioinf2305 ▴ 30

0

Entering edit mode

It is good to mask the genomes, softmask(repeats in lowercase rather than "N").

ADD REPLY • link 2.4 years ago by sunnykevin97 ▴ 990

score 5 · Answer 1 · 2020-01-22

5

Entering edit mode

4.8 years ago

lieven.sterck 15k

Soft masked!

Depends to some extent which gene predictor you're gonna apply, as in "can it interpret soft masked genomes"? Now, most gene predictors I know and have used do, so that's not really an issue.

The key thing is that if you softmask the genome you (or the gene prediction tool that is) still has all sequence info at it's disposal. If for instance the masking tool has some false positive maskings, those might still get recovered by the gene predictor as they might have some transcript data aligned to it and might be part of a valid gene.

If you hard masked the genome the prediction tool has not clue anymore of the actual sequence and is thus not able anymore to decide for itself how to interpret the masked region.

ADD COMMENT • link 4.8 years ago by lieven.sterck 15k

0

Entering edit mode

This is also a question I had; for example, a utility/programme like Chromosomer (https://github.com/gtamazian/chromosomer) would (should) accept soft-masked reference genomes and query sequences - right? Otherwise, how will one "unmask" the assembled chromosomes?

ADD REPLY • link 4.1 years ago by andorjkiss ▴ 50

0

Entering edit mode

I'm not aware of that specific tool but that sounds logic indeed. Alternatively it also makes sense that it should work with non-masked sequences.

ADD REPLY • link 4.1 years ago by lieven.sterck 15k