Hello, this is a very basic question but I was wondering if someone could help me understand if I've used the correct GTF file and Fasta file for the mouse genome indexing. I got the relevant Fasta file and GTF file from ensembl: GTF:ftp.ensembl.org/pub/release-103/gtf/mus_musculus/Mus_musculus.GRCm39.103.gtf.gz Fasta:ftp.ensembl.org/pub/release103/fasta/mus_musculus/dna/Mus_musculus.GRCm39.dna.primary_assembly.fa.gz
Or shell I use Mus_musculus.GRCm39.dna.toplevel.fa.gz for fasta to make Generating genome indexes in STAR? STAR --runMode genomeGenerate --runThreadN 8 --genomeDir index_reference --genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbGTFfile Mus_musculus.GRCm39.103.gtf
Thank you for your help!
Top level file normally contains haplotypes with the genome padded out to full length for each one. In case of GRCm39 mouse
top level
file appears to be the same asprimary
so either could be used.Primary
is safe bet.Hi @peru,
Yes, it looks good. You can always refer to the STAR manual, section 2.2, subsection 2.2.1. In general, you have to make sure chromosome names in your genome fasta and in your gtf are identical (chr1 vs 1). Since you got both fasta and gtf from the same source (Ensembl) and the same genome release version (CRCm39) you should be fine.