Hi All,
I am trying to create a RSEM index, but am running into an issue with the genome fasta file. When I download Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
from ftp.ensembl, it unzips to Mus_musculus.GRCm38.dna.chromosome.1.fa
and I get an error that chromosomes are missing. I am using this build because this is the one I used for a previous RNAseq replicate and I want to be consistent. HOWEVER, when I download the new build (Mus_musculus.GRCm39.dna.primary_assembly.fa.gz
) and unzip it, it unzips to the correct contents Mus_musculus.GRCm39.dna.primary_assembly.fa
.
Can anyone tell me what is happening here? And is there another source where I can download the correct Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
file? I already mapped all my reads to the star index created using this specific file/build before I realized this issue during RSEM, so I would rather not do it all again.
Can you add a link to the actual file so others can check it?
Here is the link I used (went through google search result) - https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjlroaB4-XzAhUJneAKHVzDApYQFnoECAUQAQ&url=ftp%3A%2F%2Fftp.ensembl.org%2Fpub%2Frelease-83%2Ffasta%2Fmus_musculus%2Fdna%2FMus_musculus.GRCm38.dna.primary_assembly.fa.gz&usg=AOvVaw0oBN7nooV3wVWFzW6dzUmx
This link worked and unzipped to the correct contents - ftp://ftp.ensembl.org/pub/release-97/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz