Hello,
I want to ask a basic question. So, I read from the readme in the Ensembl ftp for Hg38 reference and it seems there are several type of file of dna, which is only dna, dna_sm, and dna_rm. If I want to align some fastq file to Ensembl, to which kind of file I should choose? I notice that I should merge all of the fasta file which is per chromosome based fasta into 1 single fasta file and then create index and then do the alignment process. Should I merge all of the file that I download from Ensembl or I should choose only what I need? If I need to choose, how to choose what I need based on my fastq. Thank you for your help and explanation. By the way, I already merge all of the fasta file into 1 single file (including the one with PATCH name in it) and currently I'm aligning my data.
Thank you for your answer. Is one file is enough because seems there are a lot of files there? How about the per chromosome file? I browse some tutorial, I need to use only the dna.chromosome 1 to 22, plus X, Y, and MT and merge it into 1 fasta file.
The primary assembly contains the chromosomes you listed along with a minor patch I think. So it's a fasta file of everything you want.
Oh, I understand now. Thank you then.
for later releases, like this 100th release, for example, they also have a folder with a so-called "dna-index" in which there is essentially one file that looks like it might be very close to the primary assembly found in the "dna" folder. Which one should be used for the FASTQ alignment? What is this "index" file?