Hi,
I've some mouse tumour transcriptomic data (3' Quantseq), on which I want to do a differential expression analyses between treatment and control. I'm using STAR aligner to map the reads but I had the same error during index creation and mapping:
Here's what I tried:
STAR --runMode genomeGenerate \
--runThreadN 8 \
--genomeDir ./indices/ensembl/mouse_cdna_star_indices_overhang75/ \
--genomeFastaFiles ./mouse/cdna/Mus_musculus.GRCm38.cdna.all.fa \
--sjdbGTFfile ./mouse/cdna/Mus_musculus.GRCm38.gtf \
--sjdbOverhang 75;
This didn't work, so I skipped providing the GTF and the overhang parameter as mentioned in related threads here.
However, if I try aligning to the index created without a GTF, I get
Fatal INPUT FILE error, no valid exon lines in the GTF file: ./mouse/gtf/Mus_musculus.GRCm38.100.chr.gtf
Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.
This is puzzling as the neither the GTF nor the cdna fasta was altered in any way and was downloaded from Ensembl (v100).
Finally, I tried aligning after using genomic dna to create the index, (providing the GTF and overhang parameter) and it eventually worked!
Given that my focus is only on differential expression, I'm leaning towards cdna for reference. Am I missing something or doing something wrong?
Thanks.
P.S -- I tried the analysis Kallisto, which worked fine but our collaborators want me to stick to STAR for consistency with previous analyses.
Got it, thanks! Just wanted to be sure about the g-dna as reference.