Hi all,
I have aligned reads from STAR with --quantMode TranscriptomeSAM GeneCounts
, which then outputs the files Aligned.out.sam
Aligned.toTranscriptome.out.bam
and ReadsPerGene.out.tab
. I'd ideally like a piece of reputable software to calculate the TPMs from these files (plus any necessary annotations file). Can anyone recommend me the correct tool here and briefly describe what I need to get it working?
A further naive question, but if I want to do gene expression analysis i.e. differential expression or some other modeling of the normalized counts, do I ever even need the Aligned.out.sam
file? I don't know what this is used for.
EDIT: I am continually coming across RSEM as a tool that takes as input Aligned.toTranscriptome.out.bam
and outputs normalized counts. I will look into this.
Thanks! Since I don't have a fasta file of the transcriptome of my organism, I don't think I can use Salmon. Is that correct?
If you have the genome and a gtf annotation, you can extract the transcriptome with a number of different tools, such as gffread. RSEM also need a transcriptome fasta, so you would have the same problem - although RSEM also provides an script to extract the transcriptome fasta from a genome fasta and a gtf annotation.
Or see this answer for a TPM formula (along some comments on why TPM calculated with gene counts and lengths is not truly a TPM), and this post for methods of calculating gene length given an annotation.
This is very helpful. So by using gffread, I can obtain a transcriptome fasta and using this I can essentially align my reads to the transcriptome using Kallisto or Salmon? And both can estimate raw counts and normalized counts? This would reduce computation time by a lot!