When aligning reads with STAR, one can choose to project the alignment to the transcriptome as well.
In the end STAR writes one genome BAM file and one transcriptome BAM.
Afterwards, I pass the transcriptome BAM to rsem-calculate-expression for quantification.
Is it incorrect to the pass the genome BAM to RSeQC in this case instead of the transcriptome BAM? Or both would yield the same results?
(RSeQC requires a gene annotation file in BED format)
RELATED (but unanswered). RSEM BAM outputs, which one to use for RSeQC?
You're comparing apples (input to RSEM) to oranges (input to RSeQC). QC tools don't complain about much and don't waste your time even if you do it all wrong, so why are you spending more time discussing it here than you would even if you were 100% wrong?
Ram
RSeQC
takes, on average, 65 hours to finish its run on a typical BAM file that I have. I'm doing this on Microsoft Azure on a machine with 32 cores and 256 GB memory and it's still that slow. So before runningRSeQC
on a dozen more BAM files that I have, I simply want to make sure whether I should do it with genome, or transcriptome, or both! I don't think making sure I'm generating accurate reports is a waste of time.I see. Your post does not mention why you're hesitating to run QC job so I assumed it would be easier to run and check than theorize about it. You could always subsample both BAMs, run them through the software and see which results are more to your requirement.