How to mask all repeats and low complexity regions using RepeatMasker?
1
0
Entering edit mode
5.3 years ago
zwz110 • 0

I have a genome sequence in fasta format. I want to have a soft-masked genomic DNA.

After Google, I find I should do the follow thing: All repeats and low complexity regions should be replaced with lower-cased versions of their nucleic base. I have installed the RepeatMasker in Linux. I'm new to RepeatMasker. RepeatMasker manual says " Default settings are for masking all type of repeats in a primate sequence.", but I'm not sure it suits me.

I'm so confused, and I don't know what should I do, so anyone can tell me how to do it? Thank you!

Repeatmasker • 2.4k views
ADD COMMENT
0
Entering edit mode
5.3 years ago
2nelly ▴ 350

Hi zwz110,

you can directly download any masked genome from UCSC or NCBI golden path

ftp://ftp.ncbi.nlm.nih.gov/genomes/

masked regions are represented with lower case.

for instance the masked human chromosome 1 of GRCh38 assembly is here:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/CHR_01/hs_ref_GRCh38.p12_chr1.mfa.gz

Then see this post: Can I Convert Fasta Lowercase Bases To 'N'?

ADD COMMENT
0
Entering edit mode

Thank you! I got it. And I want to know more detail information about it, for example how they do the soft-masking using RepeatMasker and what's the parameter they use. That's to say, I want to learn what happens when the sequence Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gzbecomes the sequence Oryza_sativa.IRGSP-1.0.dna_rm.toplevel.fa.gz. If you know, can you tell me?

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6