I have a genome sequence in fasta format. I want to have a soft-masked genomic DNA.
After Google, I find I should do the follow thing: All repeats and low complexity regions should be replaced with lower-cased versions of their nucleic base. I have installed the RepeatMasker in Linux. I'm new to RepeatMasker. RepeatMasker manual says " Default settings are for masking all type of repeats in a primate sequence.", but I'm not sure it suits me.
I'm so confused, and I don't know what should I do, so anyone can tell me how to do it? Thank you!
Thank you! I got it. And I want to know more detail information about it, for example how they do the soft-masking using RepeatMasker and what's the parameter they use. That's to say, I want to learn what happens when the sequence Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gzbecomes the sequence Oryza_sativa.IRGSP-1.0.dna_rm.toplevel.fa.gz. If you know, can you tell me?