I have created two types of genome annotation (using braker) one using the soft-masked version of the genome and one without the masking. of course the number of annotated regions in the unmasked genome are much more, but I noticed the regions that are found in both files have the same coordinates and are comparable. how can I compare these two files and how can I know which one is more accurate?
Soft masked regions are regions where the base letters are converted to lowercase versus hard masking where the bases are converted to N, so there won't be an effect on feature ranges with either type of masking. Whether soft masking is better than no masking will depend on your use case. Some software will ignore the soft masking, others will take soft masking into account.
rpolicastro so basically if a repeat element falls inside an intron or exon, it won't be removed during gene annotation even though it is softmaksed right? it only means that by ignoring them the annotator can detect the area margins more accurately?