Different fasta files
1
0
Entering edit mode
6.2 years ago
sinha.puja ▴ 20

Can anyone tell me what is the difference between between three files shown below:

Homo_sapiens.GRCh38.dna.toplevel.fa.gz

Homo_sapiens.GRCh38.dna_rm.toplevel.fa.gz

Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz

RNA-Seq • 1.0k views
ADD COMMENT
0
Entering edit mode
6.2 years ago

The information is in the README: ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/dna/README

sequence type:

  • 'dna' - unmasked genomic DNA sequences.

    • 'dna_rm' - masked genomic DNA. Interspersed repeats and low complexity regions are detected with the RepeatMasker tool and masked by replacing repeats with 'N's.

    • 'dna_sm' - soft-masked genomic DNA. All repeats and low complexity regions have been replaced with lowercased versions of their nucleic base

ADD COMMENT
0
Entering edit mode

One could even google dna_rm and check out the first link in the results.

ADD REPLY

Login before adding your answer.

Traffic: 1575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6