Can anyone tell me what is the difference between between three files shown below:
Homo_sapiens.GRCh38.dna.toplevel.fa.gz
Homo_sapiens.GRCh38.dna_rm.toplevel.fa.gz
Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz
Can anyone tell me what is the difference between between three files shown below:
Homo_sapiens.GRCh38.dna.toplevel.fa.gz
Homo_sapiens.GRCh38.dna_rm.toplevel.fa.gz
Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz
The information is in the README: ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/dna/README
sequence type:
'dna' - unmasked genomic DNA sequences.
'dna_rm' - masked genomic DNA. Interspersed repeats and low complexity regions are detected with the RepeatMasker tool and masked by replacing repeats with 'N's.
'dna_sm' - soft-masked genomic DNA. All repeats and low complexity regions have been replaced with lowercased versions of their nucleic base
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
One could even google
dna_rm
and check out the first link in the results.