Question

Ensembl Hg38 dna, dna_rm, and dna_sm

2

Entering edit mode

9.9 years ago

bharata1803 ▴ 580

Hello,

I want to ask a basic question. So, I read from the readme in the Ensembl ftp for Hg38 reference and it seems there are several type of file of dna, which is only dna, dna_sm, and dna_rm. If I want to align some fastq file to Ensembl, to which kind of file I should choose? I notice that I should merge all of the fasta file which is per chromosome based fasta into 1 single fasta file and then create index and then do the alignment process. Should I merge all of the file that I download from Ensembl or I should choose only what I need? If I need to choose, how to choose what I need based on my fastq. Thank you for your help and explanation. By the way, I already merge all of the fasta file into 1 single file (including the one with PATCH name in it) and currently I'm aligning my data.

ensembl alignment sequencing • 6.4k views

ADD COMMENT • link updated 2.1 years ago by e.r.zakiev ▴ 250 • written 9.9 years ago by bharata1803 ▴ 580

Ram · Accepted Answer · 2015-05-14

4

Entering edit mode

9.9 years ago

andrew.j.skelton73 6.6k

The answer is to use the primary assembly. ftp://ftp.ensembl.org/pub/release-79/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 9.9 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Thank you for your answer. Is one file is enough because seems there are a lot of files there? How about the per chromosome file? I browse some tutorial, I need to use only the dna.chromosome 1 to 22, plus X, Y, and MT and merge it into 1 fasta file.

ADD REPLY • link 9.9 years ago by bharata1803 ▴ 580

0

Entering edit mode

The primary assembly contains the chromosomes you listed along with a minor patch I think. So it's a fasta file of everything you want.

ADD REPLY • link updated 2.2 years ago by Ram 45k • written 9.9 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Oh, I understand now. Thank you then.

ADD REPLY • link 9.9 years ago by bharata1803 ▴ 580

0

Entering edit mode

for later releases, like this 100th release, for example, they also have a folder with a so-called "dna-index" in which there is essentially one file that looks like it might be very close to the primary assembly found in the "dna" folder. Which one should be used for the FASTQ alignment? What is this "index" file?

ADD REPLY • link 2.1 years ago by e.r.zakiev ▴ 250