Which strategy of repeat masking should be preferred before gene prediction using Pasa, Augustus, Snap and Genescan? Should I carry out soft masking the genome or hardmasking the genome? I read that Augustus prefer soft masking but not sure about other gene predicting tools.
Depends to some extent which gene predictor you're gonna apply, as in "can it interpret soft masked genomes"?
Now, most gene predictors I know and have used do, so that's not really an issue.
The key thing is that if you softmask the genome you (or the gene prediction tool that is) still has all sequence info at it's disposal. If for instance the masking tool has some false positive maskings, those might still get recovered by the gene predictor as they might have some transcript data aligned to it and might be part of a valid gene.
If you hard masked the genome the prediction tool has not clue anymore of the actual sequence and is thus not able anymore to decide for itself how to interpret the masked region.
This is also a question I had; for example, a utility/programme like Chromosomer (https://github.com/gtamazian/chromosomer) would (should) accept soft-masked reference genomes and query sequences - right? Otherwise, how will one "unmask" the assembled chromosomes?
It is good to mask the genomes, softmask(repeats in lowercase rather than "N").