Hi, I'm using STAR to align RNA-seq data to mm39. I am using --quantMode geneCounts
as an option and the results I get are in the (very impractical for my purpose) 'ENSMUSG' format. Is there a way to get gene names instead of the Ensemble gene IDs?
Additionally, I'm confused as to why I get three columns of counts for each gene (as shown below) -- I'm only aligning one forward and one reverse fastq file, so shouldn't I be getting one set of reads per gene?
N_unmapped 16802670 16802670 16802670
N_multimapping 4055291 4055291 4055291
N_noFeature 22948357 58749681 25405703
N_ambiguous 3018494 37339 1274777
ENSMUSG00000102628 0 0 0
ENSMUSG00000100595 0 0 0
ENSMUSG00000097426 0 0 0
ENSMUSG00000104478 0 0 0
ENSMUSG00000104385 0 0 0
ENSMUSG00000086053 21 25 0
If it helps, the code I used to index the genome and align my fastq files is the following:
STAR --runMode genomeGenerate --genomeDir mm39index --genomeFastaFiles /path/to/file/Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbGTFfile /path/to/file/Mus_musculus.GRCm39.104.gtf --runThreadN 16
STAR --runThreadN 16 --genomeDir path/to/mm39index --readFilesIn blah_1.fastq.gz blah_2.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix alignments/blah-alignment
Thanks in advance for the help!
Thank you, that is really helpful!