Dear all, Do you know if STAR/RSEM are deterministic or do they have some random component to the mapping and/or quantification? Thanks,
Dear all, Do you know if STAR/RSEM are deterministic or do they have some random component to the mapping and/or quantification? Thanks,
I'm not sure about RSEM but STAR is deterministic. Here is an answer from Alex (developer of star) from the past. If I recall, RSEM is a transcript identifier, if you haven't upgraded STAR recently, I'd look into --quantMode in STAR, it'll save you time, and is an in program identifier.
Looks like OP saw it but for future people going to this thread, here is developer of STAR's answer:
" STAR alignments are always deterministic, however, the order in which they are output, as well some flags/attrbiutes (primary flag, HI) are not always deterministic.
With default parameters, and running on one thread, STAR is fully deterministic, down to the order of output alignments. With multiple threads, the order of the reads in the output is not deterministic, as STAR reads/maps/writes reads in chunks. However, the alignments for each read are completely deterministic.
With --outMultimapperOrder Random, the order of alignments for each read is no longer deterministics, which also affects primary flag and HI attribute. "
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
RSEM does not identify transcripts, it quantifies transcript abundance and accounts for multimapping between transcript isoforms (and between genes sharing similar sequence). As far as I understand, --quantMode simply counts the number of reads hitting a genomic locus, and so will not quantify transcript abundance like RSEM (and it does not have a way of properly assigning reads that map to multiple genomic loci). The --quantMode flag in STAR does something much more akin to e.g. HTSeq or featureCounts; those programs aren't appropriate for transcript level quantification.
Ah you are correct sir. I've never actually used RSEM. That being said my 'STAR is deterministic' statement still holds true.
No worries. Indeed, this has no bearing on your statements about STAR ;P.
But what about this from the release notes of STAR version 2.5.0a:
By default, the order of the multi-mapping alignments for each read is not truly random. The —outMultimapperOrder Random option outputs multiple alignments for each read in random order, and also also randomizes the choice of the primary alignment from the highest scoring alignments. Parameter —runRNGseed can be used to set the random generator seed. With this option, the ordering of multi-mapping alignments of each read, and the choice of the primary alignment will vary from run to run, unless only one thread is used and the seed is kept constant.
Oh interesting. I'm curious now too, Alex is pretty good at responding quickly so I posted this question on the RNA-STAR forum and hopefully he'll respond soon if you want to follow
RSEM has a setting "--sort-bam-by-read-name" which according to the authors will provide deterministic "maximum-likelihood" estimates from independent runs. It is disabled by default.
http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html